Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts
By: Dhruv Trehan, Paras Chopra
Potential Business Impact:
AI wrote a science paper that got accepted.
We report a case study of four end-to-end attempts to autonomously generate ML research papers using a pipeline of six LLM agents mapped to stages of the scientific workflow. Of these four, three attempts failed during implementation or evaluation. One completed the pipeline and was accepted to Agents4Science 2025, an experimental inaugural venue that required AI systems as first authors, passing both human and multi-AI review. From these attempts, we document six recurring failure modes: bias toward training data defaults, implementation drift under execution pressure, memory and context degradation across long-horizon tasks, overexcitement that declares success despite obvious failures, insufficient domain intelligence, and weak scientific taste in experimental design. We conclude by discussing four design principles for more robust AI-scientist systems, implications for autonomous scientific discovery, and we release all prompts, artifacts, and outputs at https://github.com/Lossfunk/ai-scientist-artefacts-v1
Similar Papers
AI-Researcher: Autonomous Scientific Innovation
Artificial Intelligence
AI writes and does science research papers.
The More You Automate, the Less You See: Hidden Pitfalls of AI Scientist Systems
Artificial Intelligence
Finds hidden mistakes in AI science helpers.
From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery
Computation and Language
AI helps scientists make discoveries faster.