SiamGPT: Quality-First Fine-Tuning for Stable Thai Text Generation
By: Thittipat Pairatsuppawat , Abhibhu Tachaapornchai , Paweekorn Kusolsomboon and more
Potential Business Impact:
Makes Thai language AI understand instructions better.
Open-weights large language models remain difficult to deploy for Thai due to unstable generation under complex instructions, despite strong English performance. To mitigate these limitations, We present SiamGPT-32B, an open-weights model based on Qwen3-32B, fine-tuned with a Quality-First strategy emphasizing curated supervision over data scale. The fine-tuning pipeline combines translated high-complexity English instruction data with a Thai-adapted AutoIF framework for instruction and linguistic constraints. Using supervised fine-tuning only, without continual pretraining or corpus expansion, SiamGPT-32B improves instruction adherence, multi-turn robustness, and linguistic stability. Evaluations on the SEA-HELM benchmark show that SiamGPT-32B achieves the strongest overall performance among similar-scale open-weights Thai models, with consistent gains in instruction following, multi-turn dialogue, and natural language understanding.
Similar Papers
Adapting Large Language Models to Low-Resource Tibetan: A Two-Stage Continual and Supervised Fine-Tuning Study
Computation and Language
Teaches computers to understand Tibetan language better.
Narrowing the Gap: Supervised Fine-Tuning of Open-Source LLMs as a Viable Alternative to Proprietary Models for Pedagogical Tools
Computers and Society
Teaches computers to explain coding mistakes better.
OpenThaiGPT 1.6 and R1: Thai-Centric Open Source and Reasoning Large Language Models
Computation and Language
Makes computers understand and talk Thai better.