Score: 0

Algerian Dialect

Published: December 22, 2025 | arXiv ID: 2512.19543v1

By: Zakaria Benmounah, Abdennour Boulesnane

Potential Business Impact:

Helps computers understand feelings in Algerian videos.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We present Algerian Dialect, a large-scale sentiment-annotated dataset consisting of 45,000 YouTube comments written in Algerian Arabic dialect. The comments were collected from more than 30 Algerian press and media channels using the YouTube Data API. Each comment is manually annotated into one of five sentiment categories: very negative, negative, neutral, positive, and very positive. In addition to sentiment labels, the dataset includes rich metadata such as collection timestamps, like counts, video URLs, and annotation dates. This dataset addresses the scarcity of publicly available resources for Algerian dialect and aims to support research in sentiment analysis, dialectal Arabic NLP, and social media analytics. The dataset is publicly available on Mendeley Data under a CC BY 4.0 license at https://doi.org/10.17632/zzwg3nnhsz.2.

Page Count
4 pages

Category
Computer Science:
Computation and Language