Demographic Inference from Social Media Data with Multimodal Foundation Models: Strategies, Evaluation, and Benchmarking
By: Hao Yang , Angela Yao , Eric Chang and more
Potential Business Impact:
Reads social media to guess age, gender, race.
Demographic inference plays a crucial role in understanding the representativeness and equity of social media-based research. However, existing methods typically rely on a single modality, such as text, image, or network, and are limited to predicting one or two demographic attributes, constraining their generalizability and robustness across populations. This study leverages GPT-5, a state-of-the-art multimodal foundation model, to infer age, gender, and race from social media profiles. Using a dataset of 263 publicly available X (formerly Twitter) users, we design a progressive multimodal framework that incrementally incorporates usernames, profile descriptions, tweets, and profile images to examine how each information source contributes to inference accuracy. Results show a consistent improvement across all conditions, with the inclusion of textual and visual cues substantially enhancing performance. GPT-5 achieves an overall accuracy of 0.90 for age, 0.98 for gender, and 0.85 for race, outperforming existing models under equivalent inputs. These findings demonstrate the potential of large multimodal foundation models to capture complex, cross-modal demographic cues with minimal task-specific training. The study further highlights a transparent, interpretable approach to multimodal reasoning that advances the accuracy, fairness, and scalability of demographic inference in social data analytics.
Similar Papers
Graph-based LLM over Semi-Structured Population Data for Dynamic Policy Response
Artificial Intelligence
Helps health officials understand people's needs faster.
Unsupervised Multimodal Graph-based Model for Geo-social Analysis
Social and Information Networks
Finds important news in social media posts.
Cross-Modal Prototype Augmentation and Dual-Grained Prompt Learning for Social Media Popularity Prediction
Information Retrieval
Predicts social media post popularity better.