Score: 0

Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Engineering

Published: December 4, 2025 | arXiv ID: 2512.04396v1

By: Subrata Karmaker

Potential Business Impact:

Helps computers guess if online words are sarcastic.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Sarcasm is common in online discussions, yet difficult for machines to identify because the intended meaning often contradicts the literal wording. In this work, I study sarcasm detection using only classical machine learning methods and explicit feature engineering, without relying on neural networks or context from parent comments. Using a 100,000-comment subsample of the Self-Annotated Reddit Corpus (SARC 2.0), I combine word-level and character-level TF-IDF features with simple stylistic indicators. Four models are evaluated: logistic regression, a linear SVM, multinomial Naive Bayes, and a random forest. Naive Bayes and logistic regression perform the strongest, achieving F1-scores around 0.57 for sarcastic comments. Although the lack of conversational context limits performance, the results offer a clear and reproducible baseline for sarcasm detection using lightweight and interpretable methods.

Sarcasm Detection Using Deep Convolutional Neural Networks: A Modular Deep Learning Framework

Computation and Language

Helps computers understand when people are joking.

12 Oct 2025 0

88%

Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques

Computation and Language

Helps computers understand jokes and teasing better.

31 May 2025 1

88%

Transfer Learning via Lexical Relatedness: A Sarcasm and Hate Speech Case Study

Computation and Language

Helps computers find hidden mean words online.

22 Aug 2025 0

View PDF Login to Bookmark

Page Count

11 pages

Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Engineering

Helps computers guess if online words are sarcastic.

Technical Abstract

Sarcasm Detection Using Deep Convolutional Neural Networks: A Modular Deep Learning Framework

Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques

Transfer Learning via Lexical Relatedness: A Sarcasm and Hate Speech Case Study