Beyond Protein Language Models: An Agentic LLM Framework for Mechanistic Enzyme Design
By: Bruno Jacob , Khushbu Agarwal , Marcel Baer and more
Potential Business Impact:
AI helps scientists invent new proteins faster.
We present Genie-CAT, a tool-augmented large-language-model (LLM) system designed to accelerate scientific hypothesis generation in protein design. Using metalloproteins (e.g., ferredoxins) as a case study, Genie-CAT integrates four capabilities -- literature-grounded reasoning through retrieval-augmented generation (RAG), structural parsing of Protein Data Bank files, electrostatic potential calculations, and machine-learning prediction of redox properties -- into a unified agentic workflow. By coupling natural-language reasoning with data-driven and physics-based computation, the system generates mechanistically interpretable, testable hypotheses linking sequence, structure, and function. In proof-of-concept demonstrations, Genie-CAT autonomously identifies residue-level modifications near [Fe--S] clusters that affect redox tuning, reproducing expert-derived hypotheses in a fraction of the time. The framework highlights how AI agents combining language models with domain-specific tools can bridge symbolic reasoning and numerical simulation, transforming LLMs from conversational assistants into partners for computational discovery.
Similar Papers
Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation
Artificial Intelligence
Creates new proteins for medicine and materials.
Leveraging Large Language Models for enzymatic reaction prediction and characterization
Artificial Intelligence
Helps computers guess how tiny body machines work.
Large language model-empowered next-generation computer-aided engineering
Computational Engineering, Finance, and Science
Lets computers solve hard science problems faster.