PolicyBot - Reliable Question Answering over Policy Documents
By: Gautam Nagarajan, Omir Kumar, Sudarsun Santhiappan
Potential Business Impact:
Answers questions about government rules easily.
All citizens of a country are affected by the laws and policies introduced by their government. These laws and policies serve essential functions for citizens. Such as granting them certain rights or imposing specific obligations. However, these documents are often lengthy, complex, and difficult to navigate, making it challenging for citizens to locate and understand relevant information. This work presents PolicyBot, a retrieval-augmented generation (RAG) system designed to answer user queries over policy documents with a focus on transparency and reproducibility. The system combines domain-specific semantic chunking, multilingual dense embeddings, multi-stage retrieval with reranking, and source-aware generation to provide responses grounded in the original documents. We implemented citation tracing to reduce hallucinations and improve user trust, and evaluated alternative retrieval and generation configurations to identify effective design choices. The end-to-end pipeline is built entirely with open-source tools, enabling easy adaptation to other domains requiring document-grounded question answering. This work highlights design considerations, practical challenges, and lessons learned in deploying trustworthy RAG systems for governance-related contexts.
Similar Papers
LegalRAG: A Hybrid RAG System for Multilingual Legal Information Retrieval
Information Retrieval
Finds legal information in police documents faster.
All for law and law for all: Adaptive RAG Pipeline for Legal Research
Computation and Language
Helps lawyers find correct legal answers faster.
A Knowledge Graph and a Tripartite Evaluation Framework Make Retrieval-Augmented Generation Scalable and Transparent
Information Retrieval
Chatbots answer questions more accurately and reliably.