Additive Large Language Models for Semi-Structured Text
By: Karthikeyan K, Raghuveer Thirukovalluru, David Carlson
Potential Business Impact:
Shows why doctors' notes predict patient health risks.
Large Language Models have advanced clinical text classification, but their opaque predictions remain a critical barrier to practical adoption in research and clinical settings where investigators and physicians need to understand which parts of a patient's record drive risk signals. To address this challenge, we introduce \textbf{CALM}, short for \textbf{Classification with Additive Large Language Models}, an interpretable framework for semi-structured text where inputs are composed of semantically meaningful components, such as sections of an admission note or question-answer fields from an intake form. CALM predicts outcomes as the additive sum of each component's contribution, making these contributions part of the forward computation itself and enabling faithful explanations at both the patient and population level. The additive structure also enables clear visualizations, such as component-level risk curves similar to those used in generalized additive models, making the learned relationships easier to inspect and communicate. Although CALM expects semi-structured inputs, many clinical documents already have this form, and similar structure can often be automatically extracted from free-text notes. CALM achieves performance comparable to conventional LLM classifiers while improving trust, supporting quality-assurance checks, and revealing clinically meaningful patterns during model development and auditing.
Similar Papers
Continuous Autoregressive Language Models
Computation and Language
Makes AI write faster by thinking in chunks.
Can language models boost the power of randomized experiments without statistical bias?
Methodology
Helps scientists learn more from studies using AI.
CALM: A Framework for Continuous, Adaptive, and LLM-Mediated Anomaly Detection in Time-Series Streams
Machine Learning (CS)
Finds weird changes in data as it happens.