An LLM Agentic Approach for Legal-Critical Software: A Case Study for Tax Prep Software
By: Sina Gogani-Khiabani , Ashutosh Trivedi , Diptikalyan Saha and more
Potential Business Impact:
Makes computer tax laws work better.
Large language models (LLMs) show promise for translating natural-language statutes into executable logic, but reliability in legally critical settings remains challenging due to ambiguity and hallucinations. We present an agentic approach for developing legal-critical software, using U.S. federal tax preparation as a case study. The key challenge is test-case generation under the oracle problem, where correct outputs require interpreting law. Building on metamorphic testing, we introduce higher-order metamorphic relations that compare system outputs across structured shifts among similar individuals. Because authoring such relations is tedious and error-prone, we use an LLM-driven, role-based framework to automate test generation and code synthesis. We implement a multi-agent system that translates tax code into executable software and incorporates a metamorphic-testing agent that searches for counterexamples. In experiments, our framework using a smaller model (GPT-4o-mini) achieves a worst-case pass rate of 45%, outperforming frontier models (GPT-4o and Claude 3.5, 9-15%) on complex tax-code tasks. These results support agentic LLM methodologies as a path to robust, trustworthy legal-critical software from natural-language specifications.
Similar Papers
Querying Large Automotive Software Models: Agentic vs. Direct LLM Approaches
Software Engineering
Lets computers understand complex code using simple words.
LLM Assisted Coding with Metamorphic Specification Mutation Agent
Software Engineering
Helps AI write better computer code.
Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy
Software Engineering
Tests AI to make sure it works right.