Score: 0

Beyond Protein Language Models: An Agentic LLM Framework for Mechanistic Enzyme Design

Published: November 24, 2025 | arXiv ID: 2511.19423v1

By: Bruno Jacob , Khushbu Agarwal , Marcel Baer and more

Potential Business Impact:

AI helps scientists invent new proteins faster.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We present Genie-CAT, a tool-augmented large-language-model (LLM) system designed to accelerate scientific hypothesis generation in protein design. Using metalloproteins (e.g., ferredoxins) as a case study, Genie-CAT integrates four capabilities -- literature-grounded reasoning through retrieval-augmented generation (RAG), structural parsing of Protein Data Bank files, electrostatic potential calculations, and machine-learning prediction of redox properties -- into a unified agentic workflow. By coupling natural-language reasoning with data-driven and physics-based computation, the system generates mechanistically interpretable, testable hypotheses linking sequence, structure, and function. In proof-of-concept demonstrations, Genie-CAT autonomously identifies residue-level modifications near [Fe--S] clusters that affect redox tuning, reproducing expert-derived hypotheses in a fraction of the time. The framework highlights how AI agents combining language models with domain-specific tools can bridge symbolic reasoning and numerical simulation, transforming LLMs from conversational assistants into partners for computational discovery.

Page Count
10 pages

Category
Quantitative Biology:
Quantitative Methods