Privacy-Preserved Automated Scoring using Federated Learning for Educational Research
By: Ehsan Latif, Xiaoming Zhai
Potential Business Impact:
Schools share test answers without sharing student data.
Data privacy remains a critical concern in educational research, requiring strict adherence to ethical standards and regulatory protocols. While traditional approaches rely on anonymization and centralized data collection, they often expose raw student data to security vulnerabilities and impose substantial logistical overhead. In this study, we propose a federated learning (FL) framework for automated scoring of educational assessments that eliminates the need to share sensitive data across institutions. Our approach leverages parameter-efficient fine-tuning of large language models (LLMs) with Low-Rank Adaptation (LoRA), enabling each client (school) to train locally while sharing only optimized model updates. To address data heterogeneity, we implement an adaptive weighted aggregation strategy that considers both client performance and data volume. We benchmark our model against two state-of-the-art FL methods and a centralized learning baseline using NGSS-aligned multi-label science assessment data from nine middle schools. Results show that our model achieves the highest accuracy (94.5%) among FL approaches, and performs within 0.5-1.0 percentage points of the centralized model on these metrics. Additionally, it achieves comparable rubric-level scoring accuracy, with only a 1.3% difference in rubric match and a lower score deviation (MAE), highlighting its effectiveness in preserving both prediction quality and interpretability.
Similar Papers
Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence
Machine Learning (CS)
Trains computers together without sharing private info.
Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation
Machine Learning (CS)
Steals private info from shared AI training.
Experiences Building Enterprise-Level Privacy-Preserving Federated Learning to Power AI for Science
Distributed, Parallel, and Cluster Computing
Lets AI learn from private data safely.