Score: 0

Text-to-Image Models and Their Representation of People from Different Nationalities Engaging in Activities

Published: April 8, 2025 | arXiv ID: 2504.06313v3

By: Abdulkareem Alsudais

Potential Business Impact:

AI shows people wearing old clothes, not real life.

Business Areas:
Text Analytics Data and Analytics, Software

This paper investigates how a popular Text-to-Image (T2I) model represents people from 208 different nationalities when prompted to generate images of individuals engaging in typical activities. Two scenarios were developed, and 644 images were generated based on input prompts that specified nationalities. The results show that in one scenario, 52.88% of images, and in the other, 27.4%, depict individuals wearing traditional attire. A statistically significant relationship was observed between this representation pattern and regions. This indicates that the issue disproportionately affects certain areas, particularly the Middle East & North Africa and Sub-Saharan Africa. A notable association with income groups was also found. CLIP, ALIGN, and GPT-4.1 mini were used to measure alignment scores between generated images and 3320 prompts and captions, with findings indicating statistically significant higher scores for images featuring individuals in traditional attire in one scenario. The study also examined revised prompts, finding that the word "traditional" was added by the model to 88.46% of prompts for one scenario. These findings provide valuable insights into T2I models' representation of individuals across different countries, demonstrating how the examined model prioritizes traditional characteristics despite their impracticality for the given activities.

Page Count
18 pages

Category
Computer Science:
CV and Pattern Recognition