Authors: Ibad-ur-Rehman Rashid and Muhammad Hashir Khalil
Venue: Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks, Suzhou, China (ArabicNLP 2025).
Publisher: Association for Computational Linguistics (ACL)
Links: Read the full paper on ACL Anthology | PDF Download
Abstract
This paper details our submission to Subtask 1 of the MAHED Shared Task 2025, which focuses on detecting “hope”, “hate”, and “not applicable” labels in Arabic social media content concerning the Middle East conflict. To tackle the linguistic diversity and dialect variations in the dataset, we developed and evaluated two distinct model architectures leveraging ArabicBERT.
Methodology
Our approach tested the efficacy of specialized preprocessing pipelines alongside two primary architectures:
- Multiclass Classifier: A fine-tuned ArabicBERT model directly performing multiclass classification.
- Stacked Meta-Classifier: A two-step stacked architecture where two separate binary ArabicBERT models (one for hope, one for hate) feed their probability outputs into a final logistic regression meta-classifier.
We experimented extensively with class imbalance handling, including oversampling, undersampling, and adjusted class weights, as well as distinct preprocessing configurations like emoji removal.
Key Findings
We found that proper Arabic preprocessing is the most significant driver of performance. While standard undersampling and oversampling led to overfitting, carefully adjusted weights resulted in stable performance gains.
Though the difference between the single multiclass classifier and the stacked architecture was minimal during initial testing, optimizing the stacked classifier’s preprocessing pipeline ultimately resulted in a notable improvement, reaching an F1 score of 0.91 on the official test set.
