Statistical Comparative Analysis of Semantic Similarities and Model Transferability Across Datasets for Short Answer Grading
By: Sridevi Bonthu, S. Rama Sree, M. H. M. Krishna Prasad
Potential Business Impact:
Reuses old computer smarts for new text tasks.
Developing dataset-specific models involves iterative fine-tuning and optimization, incurring significant costs over time. This study investigates the transferability of state-of-the-art (SOTA) models trained on established datasets to an unexplored text dataset. The key question is whether the knowledge embedded within SOTA models from existing datasets can be harnessed to achieve high-performance results on a new domain. In pursuit of this inquiry, two well-established benchmarks, the STSB and Mohler datasets, are selected, while the recently introduced SPRAG dataset serves as the unexplored domain. By employing robust similarity metrics and statistical techniques, a meticulous comparative analysis of these datasets is conducted. The primary goal of this work is to yield comprehensive insights into the potential applicability and adaptability of SOTA models. The outcomes of this research have the potential to reshape the landscape of natural language processing (NLP) by unlocking the ability to leverage existing models for diverse datasets. This may lead to a reduction in the demand for resource-intensive, dataset-specific training, thereby accelerating advancements in NLP and paving the way for more efficient model deployment.
Similar Papers
Annotating Training Data for Conditional Semantic Textual Similarity Measurement using Large Language Models
Computation and Language
Makes computers understand sentences better, even when tricky.
Does Language Model Understand Language?
Computation and Language
Makes computers understand language nuances better.
Quantifying Dataset Similarity to Guide Transfer Learning
Machine Learning (Stat)
Tells computers if learning from old data helps.