Score: 0

TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback

Published: January 13, 2026 | arXiv ID: 2601.08734v1

By: Prithwish Jana , Sam Davidson , Bhavana Bhasker and more

Automating Infrastructure-as-Code (IaC) is challenging, and large language models (LLMs) often produce incorrect configurations from natural language (NL). We present TerraFormer, a neuro-symbolic framework for IaC generation and mutation that combines supervised fine-tuning with verifier-guided reinforcement learning, using formal verification tools to provide feedback on syntax, deployability, and policy compliance. We curate two large, high-quality NL-to-IaC datasets, TF-Gen (152k instances) and TF-Mutn (52k instances), via multi-stage verification and iterative LLM self-correction. Evaluations against 17 state-of-the-art LLMs, including ~50x larger models like Sonnet 3.7, DeepSeek-R1, and GPT-4.1, show that TerraFormer improves correctness over its base LLM by 15.94% on IaC-Eval, 11.65% on TF-Gen (Test), and 19.60% on TF-Mutn (Test). It outperforms larger models on both TF-Gen (Test) and TF-Mutn (Test), ranks third on IaC-Eval, and achieves top best-practices and security compliance.

IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection

Artificial Intelligence

Helps computers write correct code for building online systems.

16 Dec 2025 2

88%

Multi-IaC-Eval: Benchmarking Cloud Infrastructure as Code Across Multiple Formats

Distributed, Parallel, and Cluster Computing

Helps computers build cloud setups automatically.

21 Aug 2025 3

88%

GenSIaC: Toward Security-Aware Infrastructure-as-Code Generation with Large Language Models

Cryptography and Security

Makes computer code safer from mistakes.

15 Nov 2025 1

View PDF Login to Bookmark

TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback

Technical Abstract

IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection

Multi-IaC-Eval: Benchmarking Cloud Infrastructure as Code Across Multiple Formats

GenSIaC: Toward Security-Aware Infrastructure-as-Code Generation with Large Language Models