Score: 1

Fairness is in the details: Face Dataset Auditing

Published: April 11, 2025 | arXiv ID: 2504.08396v3

By: Valentin Lafargue, Emmanuelle Claeys, Jean-Michel Loubes

Potential Business Impact:

Finds unfairness in pictures used to train AI.

Business Areas:
Facial Recognition Data and Analytics, Software

Auditing involves verifying the proper implementation of a given policy. As such, auditing is essential for ensuring compliance with the principles of fairness, equity, and transparency mandated by the European Union's AI Act. Moreover, biases present during the training phase of a learning system can persist in the modeling process and result in discrimination against certain subgroups of individuals when the model is deployed in production. Assessing bias in image datasets is a particularly complex task, as it first requires a feature extraction step, then to consider the extraction's quality in the statistical tests. This paper proposes a robust methodology for auditing image datasets based on so-called "sensitive" features, such as gender, age, and ethnicity. The proposed methodology consists of both a feature extraction phase and a statistical analysis phase. The first phase introduces a novel convolutional neural network (CNN) architecture specifically designed for extracting sensitive features with a limited number of manual annotations. The second phase compares the distributions of sensitive features across subgroups using a novel statistical test that accounts for the imprecision of the feature extraction model. Our pipeline constitutes a comprehensive and fully automated methodology for dataset auditing. We illustrate our approach using two manually annotated datasets. The code and datasets are available at github.com/ValentinLafargue/FairnessDetails.

Country of Origin
🇫🇷 France

Repos / Data Links

Page Count
31 pages

Category
Statistics:
Applications