Beyond Internal Data: Bounding and Estimating Fairness from Incomplete Data
By: Varsha Ramineni , Hossein A. Rahmani , Emine Yilmaz and more
Potential Business Impact:
Tests AI fairness using separate data sources.
Plain English Summary
Imagine you're trying to make sure a computer program that decides who gets a loan isn't unfairly biased against certain groups of people. The problem is, companies often can't share all the personal information needed to check this fairly due to privacy rules. This new method lets us check for bias even when we only have bits and pieces of information from different places. This is important because it means we can build fairer AI systems for things like loans or jobs, even when data is limited, making sure everyone gets a fair shot.
Ensuring fairness in AI systems is critical, especially in high-stakes domains such as lending, hiring, and healthcare. This urgency is reflected in emerging global regulations that mandate fairness assessments and independent bias audits. However, procuring the necessary complete data for fairness testing remains a significant challenge. In industry settings, legal and privacy concerns restrict the collection of demographic data required to assess group disparities, and auditors face practical and cultural challenges in gaining access to data. In practice, data relevant for fairness testing is often split across separate sources: internal datasets held by institutions with predictive attributes, and external public datasets such as census data containing protected attributes, each providing only partial, marginal information. Our work seeks to leverage such available separate data to estimate model fairness when complete data is inaccessible. We propose utilising the available separate data to estimate a set of feasible joint distributions and then compute the set plausible fairness metrics. Through simulation and real experiments, we demonstrate that we can derive meaningful bounds on fairness metrics and obtain reliable estimates of the true metric. Our results demonstrate that this approach can serve as a practical and effective solution for fairness testing in real-world settings where access to complete data is restricted.
Similar Papers
Beyond Internal Data: Constructing Complete Datasets for Fairness Testing
Machine Learning (CS)
Tests AI for fairness without private data.
Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks
Machine Learning (CS)
Finds and stops AI from being unfair.
AI Fairness Beyond Complete Demographics: Current Achievements and Future Directions
Computers and Society
Makes AI fair even with missing information.