Topic
fairness
New Auditing Framework Detects Synthetic Data Privacy Leaks Without Model Access
A new causal framework for auditing synthetic data detects privacy leaks by distinguishing true disclosures from phantom ones. It uses statistical hypothesis testing with holdout sets, requires no model access or canary insertion, and is orders of magnitude more efficient than shadow-model approaches.
New Benchmark 'AgentFairBench' Tests Whether LLM Agents Discriminate in Real Actions
Researchers introduce AgentFairBench, a reproducible benchmark for demographic disparity in LLM agent actions. Unlike traditional fairness tests that grade answers, it evaluates actions across hiring, lending, and medical triage using counterfactual matched sets. A pilot study with 864 decisions reveals that naively comparing score spreads can overstate disparity by ~2.4X; using a proper null methodology, Claude Haiku 4.5 showed no significant demographic effect.
Researchers Tackle Annotator Disagreement to Improve Hate Speech Classification Accuracy
A new research paper from Dehghan, Sen, and Yanikoglu explores the challenge of annotator disagreement in hate speech classification. The authors evaluate aggregation methods like majority voting and ordinal strategies, demonstrating that filtering non-consensus samples leads to over-optimistic results and that leveraging perceived hate speech strength enhances performance. They establish new state-of-the-art results for Turkish tweets.