Score: 1

Interpretable Ransomware Detection Using Hybrid Large Language Models: A Comparative Analysis of BERT, RoBERTa, and DeBERTa Through LIME and SHAP

Published: November 17, 2025 | arXiv ID: 2511.13517v1

By: Elodie Mutombo Ngoie , Mike Nkongolo Wa Nkongolo , Peace Azugo and more

Potential Business Impact:

Helps computers spot computer viruses faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Ransomware continues to evolve in complexity, making early and explainable detection a critical requirement for modern cybersecurity systems. This study presents a comparative analysis of three Transformer-based Large Language Models (LLMs) (BERT, RoBERTa, and DeBERTa) for ransomware detection using two structured datasets: UGRansome and Process Memory (PM). Since LLMs are primarily designed for natural language processing (NLP), numerical and categorical ransomware features were transformed into textual sequences using KBinsDiscretizer and token-based encoding. This enabled the models to learn behavioural patterns from system activity and network traffic through contextual embeddings. The models were fine-tuned on approximately 2,500 labelled samples and evaluated using accuracy, F1 score, and ROC-AUC. To ensure transparent decision-making in this high-stakes domain, two explainable AI techniques (LIME and SHAP) were applied to interpret feature contributions. The results show that the models learn distinct ransomware-related cues: BERT relies heavily on dominant file-operation features, RoBERTa demonstrates balanced reliance on network and financial signals, while DeBERTa exhibits strong sensitivity to financial and network-traffic indicators. Visualisation of embeddings further reveals structural differences in token representation, with RoBERTa producing more isotropic embeddings and DeBERTa capturing highly directional, disentangled patterns. In general, RoBERTa achieved the strongest F1-score, while BERT yielded the highest ROC-AUC performance. The integration of LLMs with XAI provides a transparent framework capable of identifying feature-level evidence behind ransomware predictions.

Misinformation Detection using Large Language Models with Explainability

Computation and Language

Finds fake news online and shows why.

21 Oct 2025 0

89%

AnomalyExplainer Explainable AI for LLM-based anomaly detection using BERTViz and Captum

Machine Learning (CS)

Helps computers find and explain online dangers faster.

26 Aug 2025 0

89%

Improving Crash Data Quality with Large Language Models: Evidence from Secondary Crash Narratives in Kentucky

Computation and Language

Finds hidden car crash causes in police reports.

6 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇹🇷 🇿🇦 South Africa, Turkey

Page Count

21 pages

Interpretable Ransomware Detection Using Hybrid Large Language Models: A Comparative Analysis of BERT, RoBERTa, and DeBERTa Through LIME and SHAP

Helps computers spot computer viruses faster.

Technical Abstract

Misinformation Detection using Large Language Models with Explainability

AnomalyExplainer Explainable AI for LLM-based anomaly detection using BERTViz and Captum

Improving Crash Data Quality with Large Language Models: Evidence from Secondary Crash Narratives in Kentucky