Score: 2

Multi-Intent Recognition in Dialogue Understanding: A Comparison Between Smaller Open-Source LLMs

Published: September 12, 2025 | arXiv ID: 2509.10010v1

By: Adnan Ahmad , Philine Kowol , Stefan Hillmann and more

Potential Business Impact:

Helps chatbots understand many requests at once.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

In this paper, we provide an extensive analysis of multi-label intent classification using Large Language Models (LLMs) that are open-source, publicly available, and can be run in consumer hardware. We use the MultiWOZ 2.1 dataset, a benchmark in the dialogue system domain, to investigate the efficacy of three popular open-source pre-trained LLMs, namely LLama2-7B-hf, Mistral-7B-v0.1, and Yi-6B. We perform the classification task in a few-shot setup, giving 20 examples in the prompt with some instructions. Our approach focuses on the differences in performance of these models across several performance metrics by methodically assessing these models on multi-label intent classification tasks. Additionally, we compare the performance of the instruction-based fine-tuning approach with supervised learning using the smaller transformer model BertForSequenceClassification as a baseline. To evaluate the performance of the models, we use evaluation metrics like accuracy, precision, and recall as well as micro, macro, and weighted F1 score. We also report the inference time, VRAM requirements, etc. The Mistral-7B-v0.1 outperforms two other generative models on 11 intent classes out of 14 in terms of F-Score, with a weighted average of 0.50. It also has relatively lower Humming Loss and higher Jaccard Similarity, making it the winning model in the few-shot setting. We find BERT based supervised classifier having superior performance compared to the best performing few-shot generative LLM. The study provides a framework for small open-source LLMs in detecting complex multi-intent dialogues, enhancing the Natural Language Understanding aspect of task-oriented chatbots.

Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data

Computation and Language

Small AI models understand money news well.

30 Nov 2025 0

89%

Large Language Model Data Generation for Enhanced Intent Recognition in German Speech

Computation and Language

Helps old German speakers talk to computers.

8 Aug 2025 1

89%

A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias

Computation and Language

Helps understand how AI writing is unique and fair.

14 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Repos / Data Links

github.com github.com

Page Count

12 pages

Multi-Intent Recognition in Dialogue Understanding: A Comparison Between Smaller Open-Source LLMs

Helps chatbots understand many requests at once.

Technical Abstract

Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data

Large Language Model Data Generation for Enhanced Intent Recognition in German Speech

A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias