Concentration bounds on response-based vector embeddings of black-box generative models
By: Aranyak Acharyya , Joshua Agterberg , Youngser Park and more
Potential Business Impact:
Helps understand how AI makes answers.
Generative models, such as large language models or text-to-image diffusion models, can generate relevant responses to user-given queries. Response-based vector embeddings of generative models facilitate statistical analysis and inference on a given collection of black-box generative models. The Data Kernel Perspective Space embedding is one particular method of obtaining response-based vector embeddings for a given set of generative models, already discussed in the literature. In this paper, under appropriate regularity conditions, we establish high probability concentration bounds on the sample vector embeddings for a given set of generative models, obtained through the method of Data Kernel Perspective Space embedding. Our results tell us the required number of sample responses needed in order to approximate the population-level vector embeddings with a desired level of accuracy. The algebraic tools used to establish our results can be used further for establishing concentration bounds on Classical Multidimensional Scaling embeddings in general, when the dissimilarities are observed with noise.
Similar Papers
Educational Cone Model in Embedding Vector Spaces
Artificial Intelligence
Finds best computer words for easy/hard school lessons.
Concentration bounds for intrinsic dimension estimation using Gaussian kernels
Statistics Theory
Helps computers guess how complex data is.
On the Theoretical Limitations of Embedding-Based Retrieval
Information Retrieval
Makes computer searches better, even for simple questions.