Representative Language Generation
By: Charlotte Peale, Vinod Raman, Omer Reingold
Potential Business Impact:
Makes AI create fair and varied pictures.
We introduce "representative generation," extending the theoretical framework for generation proposed by Kleinberg et al. (2024) and formalized by Li et al. (2024), to additionally address diversity and bias concerns in generative models. Our notion requires outputs of a generative model to proportionally represent groups of interest from the training data. We characterize representative uniform and non-uniform generation, introducing the "group closure dimension" as a key combinatorial quantity. For representative generation in the limit, we analyze both information-theoretic and computational aspects, demonstrating feasibility for countably infinite hypothesis classes and collections of groups under certain conditions, but proving a negative result for computability using only membership queries. This contrasts with Kleinberg et al.'s (2024) positive results for standard generation in the limit. Our findings provide a rigorous foundation for developing more diverse and representative generative models.
Similar Papers
Language Generation in the Limit: Noise, Loss, and Feedback
Data Structures and Algorithms
Teaches computers to learn any language perfectly.
Density Measures for Language Generation
Combinatorics
Helps computers learn more words from any language.
Disjoint Generative Models
Machine Learning (CS)
Makes private data for computers without sharing secrets.