A Micro-Macro Machine Learning Framework for Predicting Childhood Obesity Risk Using NHANES and Environmental Determinants
By: Eswarasanthosh Kumar Mamillapalli, Nishtha Sharma
Childhood obesity remains a major public health challenge in the United States, strongly influenced by a combination of individual-level, household-level, and environmental-level risk factors. Traditional epidemiological studies typically analyze these levels independently, limiting insights into how structural environmental conditions interact with individual-level characteristics to influence health outcomes. In this study, we introduce a micro-macro machine learning framework that integrates (1) individual-level anthropometric and socioeconomic data from NHANES and (2) macro-level structural environment features, including food access, air quality, and socioeconomic vulnerability extracted from USDA and EPA datasets. Four machine learning models Logistic Regression, Random Forest, XGBoost, and LightGBM were trained to predict obesity using NHANES microdata. XGBoost achieved the strongest performance. A composite environmental vulnerability index (EnvScore) was constructed using normalized indicators from USDA and EPA at the state level. Multi-level comparison revealed strong geographic similarity between states with high environmental burden and the nationally predicted micro-level obesity risk distribution. This demonstrates the feasibility of integrating multi-scale datasets to identify environment-driven disparities in obesity risk. This work contributes a scalable, data-driven, multi-level modeling pipeline suitable for public health informatics, demonstrating strong potential for expansion into causal modeling, intervention planning, and real-time analytics.
Similar Papers
A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata
Machine Learning (CS)
Finds why some students learn better than others.
Hybrid(Penalized Regression and MLP) Models for Outcome Prediction in HDLSS Health Data
Machine Learning (CS)
Finds diabetes risk using health survey data.
A system for objectively measuring behavior and the environment to support large-scale studies on childhood obesity
Computers and Society
Tracks your health to fight obesity.