Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance
By: Jihang Li , Qing Liu , Zulong Chen and more
Tax code prediction is a crucial yet underexplored task in automating invoicing and compliance management for large-scale e-commerce platforms. Each product must be accurately mapped to a node within a multi-level taxonomic hierarchy defined by national standards, where errors lead to financial inconsistencies and regulatory risks. This paper presents Taxon, a semantically aligned and expert-guided framework for hierarchical tax code prediction. Taxon integrates (i) a feature-gating mixture-of-experts architecture that adaptively routes multi-modal features across taxonomy levels, and (ii) a semantic consistency model distilled from large language models acting as domain experts to verify alignment between product titles and official tax definitions. To address noisy supervision in real business records, we design a multi-source training pipeline that combines curated tax databases, invoice validation logs, and merchant registration data to provide both structural and semantic supervision. Extensive experiments on the proprietary TaxCode dataset and public benchmarks demonstrate that Taxon achieves state-of-the-art performance, outperforming strong baselines. Further, an additional full hierarchical paths reconstruction procedure significantly improves structural consistency, yielding the highest overall F1 scores. Taxon has been deployed in production within Alibaba's tax service system, handling an average of over 500,000 tax code queries per day and reaching peak volumes above five million requests during business event with improved accuracy, interpretability, and robustness.
Similar Papers
TaxoAlign: Scholarly Taxonomy Generation Using Language Models
Computation and Language
Helps computers make study guides like experts.
QuanTaxo: A Quantum Approach to Self-Supervised Taxonomy Expansion
Social and Information Networks
Makes online knowledge maps grow automatically.
Domain-Adaptive Small Language Models for Structured Tax Code Prediction
Machine Learning (CS)
Helps companies correctly label products for taxes.