Topic
bias
P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models
According to a new research paper, a team introduced P3B3, an expert-curated benchmark for measuring bias between European and Brazilian Portuguese in large language models. Experiments show most LLMs strongly prefer Brazilian Portuguese, underscoring the need for more balanced variety representation in conversational AI.
Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds
A research paper titled 'Lost at the End: Primacy Bias in Multimodal Retrieval-Augmented Question Answering' introduces a controlled probe to measure position bias in multimodal KB-VQA. The study finds a strong primacy effect, where the first retrieved passage significantly outperforms later ones, contrasting with the U-shaped 'lost-in-the-middle' pattern in text-only models. The findings call for reader-side interventions and question the adequacy of recall@k as a metric for deployed systems.
AI Pluralism and the Worlds It Misses: New Research Exposes Ontological Flattening
According to new research by Mushkani and Rashid, AI pluralism efforts often miss the deeper problem of ontological flattening—where AI systems impose restrictive categories that suppress contested meanings. The paper introduces Pluralistic Lifecycle Governance (PLG), a qualitative audit framework to document ontological openness and accountability throughout an AI system's lifecycle.
Psychometric Datasheet Reveals 'Dark Current' Bias in LLM-as-a-Judge Evaluation Systems
Researchers introduce a Judge Datasheet protocol to measure biases in LLM-as-a-judge systems, including dark current under vacuum inputs and positional false preference. A case study of three open-weight models reveals stark differences in measurement reliability, with implications for enterprise AI evaluation.
Study Finds Gender Differences in AI Literacy and Deepfake Engagement Among Australian Students
A study of 199 Australian secondary students found significant gender differences in baseline AI literacy, deepfake engagement, and STEM career aspirations. Male students reported higher STEM career interest, while female students were more likely to use AI for schoolwork and seek advice from AI tools. A one-day AI literacy workshop improved knowledge for both genders, with females showing broader gains including increased confidence and career interest in AI and computer science.
Algorithm Audit Reveals LLM Hotel Recommendations Biased by Eco-Labels, Ignore Management Responses
A pre-specified algorithm audit of 12 large language models (LLMs) found that guest rating and price dominate hotel recommendations, while eco-certification is overweighted and management response is ignored. List position—a content-free artifact—also causally shifts recommendations, worth about $12 per night. The study grounds generative engine optimization and the accountability of AI infomediaries.
New Benchmark 'AgentFairBench' Tests Whether LLM Agents Discriminate in Real Actions
Researchers introduce AgentFairBench, a reproducible benchmark for demographic disparity in LLM agent actions. Unlike traditional fairness tests that grade answers, it evaluates actions across hiring, lending, and medical triage using counterfactual matched sets. A pilot study with 864 decisions reveals that naively comparing score spreads can overstate disparity by ~2.4X; using a proper null methodology, Claude Haiku 4.5 showed no significant demographic effect.
Researchers Tackle Annotator Disagreement to Improve Hate Speech Classification Accuracy
A new research paper from Dehghan, Sen, and Yanikoglu explores the challenge of annotator disagreement in hate speech classification. The authors evaluate aggregation methods like majority voting and ordinal strategies, demonstrating that filtering non-consensus samples leads to over-optimistic results and that leveraging perceived hate speech strength enhances performance. They establish new state-of-the-art results for Turkish tweets.
Technology Bridging the gender data gap: Why representation in AI is a business imperative
According to the UK government, 1 in 6 UK organizations have already implemented AI tools, but bias from unrepresentative data risks perpetuating discrimination and regulatory penalties. The London School of Economics found that large language models like Google's Gemma may introduce gender bias into care decisions. Experts stress that data integrity—through integration, governance, enrichment, and observability—is critical to mitigating bias and ensuring AI outputs are fair and accurate.
Algorithmic Monocultures: Impact on Hiring Diversity
Algorithmic monocultures in hiring are creating homogeneous outcomes, impacting diversity. Over 90% of U.S. employers use similar algorithms, leading to systemic rejections and racial disparities.