A Multi-Agent System for Information Extraction from the Chemical Literature
By: Yufan Chen , Ching Ting Leung , Bowen Yu and more
Potential Business Impact:
Helps computers understand chemistry pictures for research.
To fully expedite AI-powered chemical research, high-quality chemical databases are the cornerstone. Automatic extraction of chemical information from the literature is essential for constructing reaction databases, but it is currently limited by the multimodality and style variability of chemical information. In this work, we developed a multimodal large language model (MLLM)-based multi-agent system for automatic chemical information extraction. We used the MLLM's strong reasoning capability to understand the structure of complex chemical graphics, decompose the extraction task into sub-tasks and coordinate a set of specialized agents to solve them. Our system achieved an F1 score of 80.8% on a benchmark dataset of complex chemical reaction graphics from the literature, surpassing the previous state-of-the-art model (F1 score: 35.6%) by a significant margin. Additionally, it demonstrated consistent improvements in key sub-tasks, including molecular image recognition, reaction image parsing, named entity recognition and text-based reaction extraction. This work is a critical step toward automated chemical information extraction into structured datasets, which will be a strong promoter of AI-driven chemical research.
Similar Papers
A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature
Artificial Intelligence
Helps computers learn chemistry from pictures.
Optimizing Data Extraction from Materials Science Literature: A Study of Tools Using Large Language Models
Digital Libraries
AI finds science facts in papers faster.
Multi-agent systems for chemical engineering: A review and perspective
Multiagent Systems
Teams of AI help design new chemicals faster.