Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data
By: Waris Gill , Justin Cechmanek , Tyler Hutcherson and more
Potential Business Impact:
Makes computers remember information faster and better.
This report investigates enhancing semantic caching effectiveness by employing specialized, fine-tuned embedding models. Semantic caching relies on embedding similarity rather than exact key matching, presenting unique challenges in balancing precision, query latency, and computational efficiency. We propose leveraging smaller, domain-specific embedding models, fine-tuned with targeted real-world and synthetically generated datasets. Our empirical evaluations demonstrate that compact embedding models fine-tuned for just one epoch on specialized datasets significantly surpass both state-of-the-art open-source and proprietary alternatives in precision and recall. Moreover, we introduce a novel synthetic data generation pipeline for the semantic cache that mitigates the challenge of limited domain-specific annotated data, further boosting embedding performance. Our approach effectively balances computational overhead and accuracy, establishing a viable and efficient strategy for practical semantic caching implementations.
Similar Papers
An Ensemble Embedding Approach for Improving Semantic Caching Performance in LLM-based Systems
Machine Learning (CS)
Makes AI answer questions faster and cheaper.
Adaptation of Embedding Models to Financial Filings via LLM Distillation
Computation and Language
Teaches AI to find specific money information faster.
Rethinking Data: Towards Better Performing Domain-Specific Small Language Models
Computation and Language
Makes small AI models answer questions as well as big ones.