The Dark Side of the Web: Towards Understanding Various Data Sources in Cyber Threat Intelligence
By: Saskia Laura Schröer , Noé Canevascini , Irdin Pekaric and more
Potential Business Impact:
Finds secret crime plans from hidden internet chats.
Cyber threats have become increasingly prevalent and sophisticated. Prior work has extracted actionable cyber threat intelligence (CTI), such as indicators of compromise, tactics, techniques, and procedures (TTPs), or threat feeds from various sources: open source data (e.g., social networks), internal intelligence (e.g., log data), and ``first-hand'' communications from cybercriminals (e.g., underground forums, chats, darknet websites). However, "first-hand" data sources remain underutilized because it is difficult to access or scrape their data. In this work, we analyze (i) 6.6 million posts, (ii) 3.4 million messages, and (iii) 120,000 darknet websites. We combine NLP tools to address several challenges in analyzing such data. First, even on dedicated platforms, only some content is CTI-relevant, requiring effective filtering. Second, "first-hand" data can be CTI-relevant from a technical or strategic viewpoint. We demonstrate how to organize content along this distinction. Third, we describe the topics discussed and how "first-hand" data sources differ from each other. According to our filtering, 20% of our sample is CTI-relevant. Most of the CTI-relevant data focuses on strategic rather than technical discussions. Credit card-related crime is the most prevalent topic on darknet websites. On underground forums and chat channels, account and subscription selling is discussed most. Topic diversity is higher on underground forums and chat channels than on darknet websites. Our analyses suggest that different platforms may be used for activities with varying complexity and risks for criminals.
Similar Papers
Identification of Malicious Posts on the Dark Web Using Supervised Machine Learning
Cryptography and Security
Finds bad guys talking on the dark web.
CTI Dataset Construction from Telegram
Cryptography and Security
Finds online dangers from chat messages.
CTI-HAL: A Human-Annotated Dataset for Cyber Threat Intelligence Analysis
Cryptography and Security
Helps computers understand online threats faster.