ReGAIN: Retrieval-Grounded AI Framework for Network Traffic Analysis
By: Shaghayegh Shajarian , Kennedy Marsh , James Benson and more
Potential Business Impact:
Finds network problems with clear, trustworthy answers.
Modern networks generate vast, heterogeneous traffic that must be continuously analyzed for security and performance. Traditional network traffic analysis systems, whether rule-based or machine learning-driven, often suffer from high false positives and lack interpretability, limiting analyst trust. In this paper, we present ReGAIN, a multi-stage framework that combines traffic summarization, retrieval-augmented generation (RAG), and Large Language Model (LLM) reasoning for transparent and accurate network traffic analysis. ReGAIN creates natural-language summaries from network traffic, embeds them into a multi-collection vector database, and utilizes a hierarchical retrieval pipeline to ground LLM responses with evidence citations. The pipeline features metadata-based filtering, MMR sampling, a two-stage cross-encoder reranking mechanism, and an abstention mechanism to reduce hallucinations and ensure grounded reasoning. Evaluated on ICMP ping flood and TCP SYN flood traces from the real-world traffic dataset, it demonstrates robust performance, achieving accuracy between 95.95% and 98.82% across different attack types and evaluation benchmarks. These results are validated against two complementary sources: dataset ground truth and human expert assessments. ReGAIN also outperforms rule-based, classical ML, and deep learning baselines while providing unique explainability through trustworthy, verifiable responses.
Similar Papers
MalRAG: A Retrieval-Augmented LLM Framework for Open-set Malicious Traffic Identification
Cryptography and Security
Finds new computer viruses automatically.
Retrieval Augmented Generation with Multi-Modal LLM Framework for Wireless Environments
Networking and Internet Architecture
Makes wireless internet faster and more reliable.
Financial Analysis: Intelligent Financial Data Analysis System Based on LLM-RAG
Statistical Finance
Helps computers understand money news faster.