Artificial Intelligence #reinforcement learning#chain-of-thought
Reinforcement Learning with Chain-of-Thought Supervision Boosts Hateful Meme Detection Accuracy by Over 2%
A new reinforcement learning-based post-training method using Group Relative Policy Optimization and chain-of-thought supervision improves hateful and propagandistic meme detection. On the FHM benchmark, accuracy rose from 79.9% to 82.0%; on ArMeme, macro-F1 increased by 7.6 points to 0.612. The approach also generates natural-language explanations for predictions.
Jun 16, 2026 1 source