Quantitative Auditing of AI Fairness with Differentially Private Synthetic Data
By: Chih-Cheng Rex Yuan, Bow-Yaw Wang
Potential Business Impact:
Tests AI for fairness without seeing private data.
Fairness auditing of AI systems can identify and quantify biases. However, traditional auditing using real-world data raises security and privacy concerns. It exposes auditors to security risks as they become custodians of sensitive information and targets for cyberattacks. Privacy risks arise even without direct breaches, as data analyses can inadvertently expose confidential information. To address these, we propose a framework that leverages differentially private synthetic data to audit the fairness of AI systems. By applying privacy-preserving mechanisms, it generates synthetic data that mirrors the statistical properties of the original dataset while ensuring privacy. This method balances the goal of rigorous fairness auditing and the need for strong privacy protections. Through experiments on real datasets like Adult, COMPAS, and Diabetes, we compare fairness metrics of synthetic and real data. By analyzing the alignment and discrepancies between these metrics, we assess the capacity of synthetic data to preserve the fairness properties of real data. Our results demonstrate the framework's ability to enable meaningful fairness evaluations while safeguarding sensitive information, proving its applicability across critical and sensitive domains.
Similar Papers
Beyond Internal Data: Constructing Complete Datasets for Fairness Testing
Machine Learning (CS)
Tests AI for fairness without private data.
Synthetic Data Privacy Metrics
Machine Learning (CS)
Makes fake data as good as real, safely.
Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms
Machine Learning (CS)
Makes learning software fairer and safer for students.