We Need Knowledge Distillation for Solving Math Word Problems
By: Zhenquan Shen , Xinguo Yu , Xiaotian Cheng and more
Potential Business Impact:
Makes smart math tutors use less computer power.
The enhancement of mathematical capabilities in large language models (LLMs) fosters new developments in mathematics education within primary and secondary schools, particularly as they relate to intelligent tutoring systems. However, LLMs require substantial computational resources, resulting in significant costs in educational contexts. To mitigate this drawback, this paper investigates the feasibility of compressing LLMs for solving math word problems (MWPs). We compress the embedded vectors encoded by BERT and distill a considerably smaller student model. Our findings indicate that the student model can maintain nearly 90% of the performance of the teacher model while utilizing only 1/12 of its parameters. In addition to achieving high accuracy, the model exhibits strong generalizability, as the compressed vectors perform well across all tasks related to MWPs, and the distillation process is not task-specific. The success of this distillation demonstrates that the underlying principles are generic and not limited to a specific task. We further explore the reasons behind the compressibility of embedded vectors, revealing that part-of-speech information, rather than entity recognition, is crucial for MWPs, which may significantly contribute to their compressibility. The improvements in efficiency and cost reduction provide substantial value for intelligent tutoring systems and significantly advance the field of intelligent education.
Similar Papers
Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks
Computation and Language
Makes smart computer programs smaller and faster.
A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving
Computation and Language
Teaches computers to solve math problems better.
Elementary Math Word Problem Generation using Large Language Models
Computation and Language
Makes math problems for students automatically.