Score: 0

Adapting Small Language Models to Low-Resource Domains: A Case Study in Hindi Tourism QA

Published: October 29, 2025 | arXiv ID: 2510.25273v1

By: Sandipan Majhi, Paheli Bhattacharya

Potential Business Impact:

Helps computers answer questions about Hindi tourism.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Domain-specific question answering in low-resource languages faces two key challenges: scarcity of annotated datasets and limited domain knowledge in general-purpose language models. In this work, we present a multi-stage finetuning strategy to adapt lightweight language models to the Hindi tourism domain by leveraging both original and synthetic training data. Synthetic question-answer pairs are generated using large LLMs (LLaMA-70B, Phi-14B) and used to augment the limited original dataset. We explore several training methodologies and analyse their impact on domain generalisation. Our results demonstrate that large models can efficiently generate synthetic data, while small models can effectively adapt to it, offering a scalable pathway for low-resource, domain-specific QA.

Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models

Computation and Language

Helps computers answer questions about Indian temples.

28 Nov 2025 0

90%

Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain

Computation and Language

Helps farmers get farming advice in their language.

22 Jul 2025 0

90%

Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain

Computation and Language

Helps farmers get farming advice in their language.

22 Jul 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇳 India

Page Count

7 pages

Adapting Small Language Models to Low-Resource Domains: A Case Study in Hindi Tourism QA

Helps computers answer questions about Hindi tourism.

Technical Abstract

Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models

Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain

Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain