Score: 2

Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation

Published: November 27, 2025 | arXiv ID: 2511.22311v1

By: Fiona Y. Wang , Di Sheng Lee , David L. Kaplan and more

BigTech Affiliations: Massachusetts Institute of Technology

Potential Business Impact:

Creates new proteins for medicine and materials.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Designing proteins de novo with tailored structural, physicochemical, and functional properties remains a grand challenge in biotechnology, medicine, and materials science, due to the vastness of sequence space and the complex coupling between sequence, structure, and function. Current state-of-the-art generative methods, such as protein language models (PLMs) and diffusion-based architectures, often require extensive fine-tuning, task-specific data, or model reconfiguration to support objective-directed design, thereby limiting their flexibility and scalability. To overcome these limitations, we present a decentralized, agent-based framework inspired by swarm intelligence for de novo protein design. In this approach, multiple large language model (LLM) agents operate in parallel, each assigned to a specific residue position. These agents iteratively propose context-aware mutations by integrating design objectives, local neighborhood interactions, and memory and feedback from previous iterations. This position-wise, decentralized coordination enables emergent design of diverse, well-defined sequences without reliance on motif scaffolds or multiple sequence alignments, validated with experiments on proteins with alpha helix and coil structures. Through analyses of residue conservation, structure-based metrics, and sequence convergence and embeddings, we demonstrate that the framework exhibits emergent behaviors and effective navigation of the protein fitness landscape. Our method achieves efficient, objective-directed designs within a few GPU-hours and operates entirely without fine-tuning or specialized training, offering a generalizable and adaptable solution for protein design. Beyond proteins, the approach lays the groundwork for collective LLM-driven design across biomolecular systems and other scientific discovery tasks.

Beyond Protein Language Models: An Agentic LLM Framework for Mechanistic Enzyme Design

Quantitative Methods

AI helps scientists invent new proteins faster.

24 Nov 2025 0

89%

Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search

Artificial Intelligence

Designs new proteins by learning from nature's code.

13 Nov 2025 1

89%

LLM Agent Swarm for Hypothesis-Driven Drug Discovery

Artificial Intelligence

Finds new medicines faster using smart AI teams.

24 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

24 pages

Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation

Creates new proteins for medicine and materials.

Technical Abstract

Beyond Protein Language Models: An Agentic LLM Framework for Mechanistic Enzyme Design

Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search

LLM Agent Swarm for Hypothesis-Driven Drug Discovery