A Leakage-Aware Data Layer For Student Analytics: The Capire Framework For Multilevel Trajectory Modeling
By: H. R. Paz
Potential Business Impact:
Finds students likely to quit school early.
Predictive models for student dropout, while often accurate, frequently rely on opportunistic feature sets and suffer from undocumented data leakage, limiting their explanatory power and institutional usefulness. This paper introduces a leakage-aware data layer for student trajectory analytics, which serves as the methodological foundation for the CAPIRE framework for multilevel modelling. We propose a feature engineering design that organizes predictors into four levels: N1 (personal and socio-economic attributes), N2 (entry moment and academic history), N3 (curricular friction and performance), and N4 (institutional and macro-context variables)As a core component, we formalize the Value of Observation Time (VOT) as a critical design parameter that rigorously separates observation windows from outcome horizons, preventing data leakage by construction. An illustrative application in a long-cycle engineering program (1,343 students, ~57% dropout) demonstrates that VOT-restricted multilevel features support robust archetype discovery. A UMAP + DBSCAN pipeline uncovers 13 trajectory archetypes, including profiles of "early structural crisis," "sustained friction," and "hidden vulnerability" (low friction but high dropout). Bootstrap and permutation tests confirm these archetypes are statistically robust and temporally stable. We argue that this approach transforms feature engineering from a technical step into a central methodological artifact. This data layer serves as a disciplined bridge between retention theory, early-warning systems, and the future implementation of causal inference and agent-based modelling (ABM) within the CAPIRE program.
Similar Papers
The CAPIRE Curriculum Graph: Structural Feature Engineering for Curriculum-Constrained Student Modelling in Higher Education
Computers and Society
Helps predict which students might struggle in college.
CAPIRE Intervention Lab: An Agent-Based Policy Simulation Environment for Curriculum-Constrained Engineering Programmes
Computers and Society
Tests teaching ideas to help students stay in school.
Regularity as Structural Amplifier, Not Trap: A Causal and Archetype-Based Analysis of Dropout in a Constrained Engineering Curriculum
Computers and Society
Fixes school rules that push smart students out.