Score: 1

CTIArena: Benchmarking LLM Knowledge and Reasoning Across Heterogeneous Cyber Threat Intelligence

Published: October 13, 2025 | arXiv ID: 2510.11974v1

By: Yutong Cheng , Yang Liu , Changze Li and more

Potential Business Impact:

Helps computers understand cyber threats better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Cyber threat intelligence (CTI) is central to modern cybersecurity, providing critical insights for detecting and mitigating evolving threats. With the natural language understanding and reasoning capabilities of large language models (LLMs), there is increasing interest in applying them to CTI, which calls for benchmarks that can rigorously evaluate their performance. Several early efforts have studied LLMs on some CTI tasks but remain limited: (i) they adopt only closed-book settings, relying on parametric knowledge without leveraging CTI knowledge bases; (ii) they cover only a narrow set of tasks, lacking a systematic view of the CTI landscape; and (iii) they restrict evaluation to single-source analysis, unlike realistic scenarios that require reasoning across multiple sources. To fill these gaps, we present CTIArena, the first benchmark for evaluating LLM performance on heterogeneous, multi-source CTI under knowledge-augmented settings. CTIArena spans three categories, structured, unstructured, and hybrid, further divided into nine tasks that capture the breadth of CTI analysis in modern security operations. We evaluate ten widely used LLMs and find that most struggle in closed-book setups but show noticeable gains when augmented with security-specific knowledge through our designed retrieval-augmented techniques. These findings highlight the limitations of general-purpose LLMs and the need for domain-tailored techniques to fully unlock their potential for CTI.

AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Cryptography and Security

Helps computers understand computer attack dangers better.

3 Nov 2025 2

89%

Advancing Autonomous Incident Response: Leveraging LLMs and Cyber Threat Intelligence

Cryptography and Security

Helps computers fight cyber threats faster.

14 Aug 2025 0

89%

Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents

Cryptography and Security

Tests AI's real cybersecurity skills, not just knowledge.

28 Oct 2025 2

View PDF Login to Bookmark

Repos / Data Links

github.com github.com github.com github.com github.com

Page Count

36 pages

CTIArena: Benchmarking LLM Knowledge and Reasoning Across Heterogeneous Cyber Threat Intelligence

Helps computers understand cyber threats better.

Technical Abstract

AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Advancing Autonomous Incident Response: Leveraging LLMs and Cyber Threat Intelligence

Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents