How DDAIR you? Disambiguated Data Augmentation for Intent Recognition
By: Galo Castillo-López , Alexis Lombard , Nasredine Semmar and more
Potential Business Impact:
Cleans up computer-made examples for better understanding.
Large Language Models (LLMs) are effective for data augmentation in classification tasks like intent detection. In some cases, they inadvertently produce examples that are ambiguous with regard to untargeted classes. We present DDAIR (Disambiguated Data Augmentation for Intent Recognition) to mitigate this problem. We use Sentence Transformers to detect ambiguous class-guided augmented examples generated by LLMs for intent recognition in low-resource scenarios. We identify synthetic examples that are semantically more similar to another intent than to their target one. We also provide an iterative re-generation method to mitigate such ambiguities. Our findings show that sentence embeddings effectively help to (re)generate less ambiguous examples, and suggest promising potential to improve classification performance in scenarios where intents are loosely or broadly defined.
Similar Papers
LLM-Guided Synthetic Augmentation (LGSA) for Mitigating Bias in AI Systems
Computation and Language
Makes AI fairer by teaching it about everyone.
LLMCARE: Alzheimer's Detection via Transformer Models Enhanced by LLM-Generated Synthetic Data
Computation and Language
Finds early signs of memory loss in voices.
Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection
Cryptography and Security
Stops bad online messages faster and better.