Neurosymbolic Information Extraction from Transactional Documents
By: Arthur Hemmer , Mickaël Coustaty , Nicola Bartolo and more
Potential Business Impact:
Helps computers understand money papers better.
This paper presents a neurosymbolic framework for information extraction from documents, evaluated on transactional documents. We introduce a schema-based approach that integrates symbolic validation methods to enable more effective zero-shot output and knowledge distillation. The methodology uses language models to generate candidate extractions, which are then filtered through syntactic-, task-, and domain-level validation to ensure adherence to domain-specific arithmetic constraints. Our contributions include a comprehensive schema for transactional documents, relabeled datasets, and an approach for generating high-quality labels for knowledge distillation. Experimental results demonstrate significant improvements in $F_1$-scores and accuracy, highlighting the effectiveness of neurosymbolic validation in transactional document processing.
Similar Papers
A Neurosymbolic Approach to Natural Language Formalization and Verification
Computation and Language
Makes AI follow rules perfectly, like a robot lawyer.
Information Extraction from Conversation Transcripts: Neuro-Symbolic vs. LLM
Computation and Language
Helps computers understand farm talk better.
Neuro-Symbolic Frameworks: Conceptual Characterization and Empirical Comparative Analysis
Artificial Intelligence
Helps computers learn and explain answers better.