Researchers Tackle Annotator Disagreement to Improve Hate Speech Classification Accuracy

A new research paper from Dehghan, Sen, and Yanikoglu explores the challenge of annotator disagreement in hate speech classification. The authors evaluate aggregation methods like majority voting and ordinal strategies, demonstrating that filtering non-consensus samples leads to over-optimistic results and that leveraging perceived hate speech strength enhances performance. They establish new state-of-the-art results for Turkish tweets.

iGEN Editorial

June 16, 2026

Researchers Tackle Annotator Disagreement to Improve Hate Speech Classification Accuracy

Hate speech detection is a critical task, especially on social media where harmful content spreads quickly. However, the inherently subjective nature of hate speech leads to frequent disagreement among annotators, particularly for subtle or borderline content, according to a new study from researchers Dehghan, Somaiyeh; Sen, Mehmet Umut; and Yanikoglu, Berrin. Their paper, published on arXiv, examines this largely overlooked problem and evaluates a range of aggregation methods for handling annotator disagreement.

The Problem of Annotator Disagreement

The researchers note that traditional approaches often discard non-consensus samples or force a 'gold standard' through expert adjudication, ignoring valuable information about uncertainty and diverse human perspectives. This practice can bias models and produce over-optimistic results. The study analyzes methods including majority voting, ordinal strategies (minimum, maximum, and mean), and their impact across binary, 4-class, and 6-class classification tasks.

Key Findings: Modeling Disagreement Improves Robustness

The paper demonstrates that filtering non-consensus samples results in over-optimistic results. Instead, the authors show that annotator disagreement, when properly modeled, is a valuable resource for building more robust and reliable systems. They also leverage annotators' perceived hate speech strength scores to explore regression-based and hybrid modeling approaches, finding that this perceived strength provides a complementary signal that enhances classification performance.

Aggregation Method	Description	Impact on Performance
Majority Voting	Standard label assignment based on most common annotation	Baseline method
Ordinal (min, max, mean)	Uses ordered labels from annotators	Mixed results across tasks
Regression-based	Uses continuous hate speech strength scores	Enhances classification
Hybrid	Combines classification with strength signals	Achieves new state-of-the-art

State-of-the-Art Results for Turkish Tweets

The researchers applied their methods to Turkish tweets and established new state-of-the-art results for hate speech detection in that language. The study highlights that the perceived strength signal, when incorporated, improves model accuracy and robustness.

Implications for Enterprise AI Applications

While focused on hate speech, the findings have broader implications for any classification task where human annotation is subjective — including content moderation, customer feedback analysis, and even areas like supply chain risk assessment where expert judgments may vary. For CTOs and technology leaders building AI systems, the research underscores the importance of preserving annotator disagreement rather than discarding it, as it can lead to more reliable models.

The paper is available on arXiv and was submitted on February 12, 2025, with multiple revisions through June 2026.

Sources:

Researchers Tackle Annotator Disagreement to Improve Hate Speech Classification Accuracy

The Problem of Annotator Disagreement

Key Findings: Modeling Disagreement Improves Robustness

State-of-the-Art Results for Turkish Tweets

Implications for Enterprise AI Applications

Recommended Stories

TreeTracer Visualizes Hidden LLM Bias Through Stochastic Path Aggregation for Enterprise AI Auditing

Psychometric Datasheet Reveals 'Dark Current' Bias in LLM-as-a-Judge Evaluation Systems

CREDENCE Framework Improves Automated Fact-Checking with Semantic Metrics and Convergence Analysis

British Police Predictive AI Models Quietly Abandoned After Staff Lost Trust in Results