Score: 1

Breaking BERT: Gradient Attack on Twitter Sentiment Analysis for Targeted Misclassification

Published: April 2, 2025 | arXiv ID: 2504.01345v1

By: Akil Raj Subedi , Taniya Shah , Aswani Kumar Cherukuri and more

Potential Business Impact:

Tricks computers into thinking fake reviews are real.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Social media platforms like Twitter have increasingly relied on Natural Language Processing NLP techniques to analyze and understand the sentiments expressed in the user generated content. One such state of the art NLP model is Bidirectional Encoder Representations from Transformers BERT which has been widely adapted in sentiment analysis. BERT is susceptible to adversarial attacks. This paper aims to scrutinize the inherent vulnerabilities of such models in Twitter sentiment analysis. It aims to formulate a framework for constructing targeted adversarial texts capable of deceiving these models, while maintaining stealth. In contrast to conventional methodologies, such as Importance Reweighting, this framework core idea resides in its reliance on gradients to prioritize the importance of individual words within the text. It uses a whitebox approach to attain fine grained sensitivity, pinpointing words that exert maximal influence on the classification outcome. This paper is organized into three interdependent phases. It starts with fine-tuning a pre-trained BERT model on Twitter data. It then analyzes gradients of the model to rank words on their importance, and iteratively replaces those with feasible candidates until an acceptable solution is found. Finally, it evaluates the effectiveness of the adversarial text against the custom trained sentiment classification model. This assessment would help in gauging the capacity of the adversarial text to successfully subvert classification without raising any alarm.

TWSSenti: A Novel Hybrid Framework for Topic-Wise Sentiment Analysis on Social Media Using Transformer Models

Computation and Language

Reads feelings from online words better.

14 Apr 2025 0

88%

Predicting Stock Movement with BERTweet and Transformers

Machine Learning (CS)

Helps predict stock prices using Twitter.

13 Mar 2025 1

87%

The Application of Transformer-Based Models for Predicting Consequences of Cyber Attacks

Machine Learning (CS)

Predicts cyberattack damage to stop them.

18 Aug 2025 0

View PDF Login to Bookmark

Page Count

14 pages

Breaking BERT: Gradient Attack on Twitter Sentiment Analysis for Targeted Misclassification

Tricks computers into thinking fake reviews are real.

Technical Abstract

TWSSenti: A Novel Hybrid Framework for Topic-Wise Sentiment Analysis on Social Media Using Transformer Models

Predicting Stock Movement with BERTweet and Transformers

The Application of Transformer-Based Models for Predicting Consequences of Cyber Attacks