Score: 1

Statistical Comparative Analysis of Semantic Similarities and Model Transferability Across Datasets for Short Answer Grading

Published: August 19, 2025 | arXiv ID: 2508.15837v1

By: Sridevi Bonthu, S. Rama Sree, M. H. M. Krishna Prasad

Potential Business Impact:

Reuses old computer smarts for new text tasks.

Developing dataset-specific models involves iterative fine-tuning and optimization, incurring significant costs over time. This study investigates the transferability of state-of-the-art (SOTA) models trained on established datasets to an unexplored text dataset. The key question is whether the knowledge embedded within SOTA models from existing datasets can be harnessed to achieve high-performance results on a new domain. In pursuit of this inquiry, two well-established benchmarks, the STSB and Mohler datasets, are selected, while the recently introduced SPRAG dataset serves as the unexplored domain. By employing robust similarity metrics and statistical techniques, a meticulous comparative analysis of these datasets is conducted. The primary goal of this work is to yield comprehensive insights into the potential applicability and adaptability of SOTA models. The outcomes of this research have the potential to reshape the landscape of natural language processing (NLP) by unlocking the ability to leverage existing models for diverse datasets. This may lead to a reduction in the demand for resource-intensive, dataset-specific training, thereby accelerating advancements in NLP and paving the way for more efficient model deployment.

Page Count
9 pages

Category
Computer Science:
Computation and Language