LLM-as-classifier: Semi-Supervised, Iterative Framework for Hierarchical Text Classification using Large Language Models
By: Doohee You , Andy Parisi , Zach Vander Velden and more
Potential Business Impact:
Makes smart computer programs sort text better.
The advent of Large Language Models (LLMs) has provided unprecedented capabilities for analyzing unstructured text data. However, deploying these models as reliable, robust, and scalable classifiers in production environments presents significant methodological challenges. Standard fine-tuning approaches can be resource-intensive and often struggle with the dynamic nature of real-world data distributions, which is common in the industry. In this paper, we propose a comprehensive, semi-supervised framework that leverages the zero- and few-shot capabilities of LLMs for building hierarchical text classifiers as a framework for a solution to these industry-wide challenges. Our methodology emphasizes an iterative, human-in-the-loop process that begins with domain knowledge elicitation and progresses through prompt refinement, hierarchical expansion, and multi-faceted validation. We introduce techniques for assessing and mitigating sequence-based biases and outline a protocol for continuous monitoring and adaptation. This framework is designed to bridge the gap between the raw power of LLMs and the practical need for accurate, interpretable, and maintainable classification systems in industry applications.
Similar Papers
Small sample-based adaptive text classification through iterative and contrastive description refinement
Machine Learning (CS)
Teaches computers to sort text without new training.
LLM-MemCluster: Empowering Large Language Models with Dynamic Memory for Text Clustering
Computation and Language
Lets computers group words by meaning better.
LLM driven Text-to-Table Generation through Sub-Tasks Guidance and Iterative Refinement
Computation and Language
Helps computers turn messy notes into organized charts.