Score: 0

Demographic Inference from Social Media Data with Multimodal Foundation Models: Strategies, Evaluation, and Benchmarking

Published: November 26, 2025 | arXiv ID: 2512.03064v1

By: Hao Yang , Angela Yao , Eric Chang and more

Potential Business Impact:

Reads social media to guess age, gender, race.

Business Areas:
Text Analytics Data and Analytics, Software

Demographic inference plays a crucial role in understanding the representativeness and equity of social media-based research. However, existing methods typically rely on a single modality, such as text, image, or network, and are limited to predicting one or two demographic attributes, constraining their generalizability and robustness across populations. This study leverages GPT-5, a state-of-the-art multimodal foundation model, to infer age, gender, and race from social media profiles. Using a dataset of 263 publicly available X (formerly Twitter) users, we design a progressive multimodal framework that incrementally incorporates usernames, profile descriptions, tweets, and profile images to examine how each information source contributes to inference accuracy. Results show a consistent improvement across all conditions, with the inclusion of textual and visual cues substantially enhancing performance. GPT-5 achieves an overall accuracy of 0.90 for age, 0.98 for gender, and 0.85 for race, outperforming existing models under equivalent inputs. These findings demonstrate the potential of large multimodal foundation models to capture complex, cross-modal demographic cues with minimal task-specific training. The study further highlights a transparent, interpretable approach to multimodal reasoning that advances the accuracy, fairness, and scalability of demographic inference in social data analytics.

Country of Origin
🇺🇸 United States

Page Count
21 pages

Category
Computer Science:
Social and Information Networks