Topic
ethics
New Framework Detects and Measures AI Dangers to Democracy Using Principal-Agent Theory
A new research paper by Sandri and Novelli presents an analytical framework to detect and measure the dangers AI poses to democratic processes. The framework applies principal-agent theory and the NIST AI Risk Management Framework to identify accountability gaps and governance failures, centering on institutional assessability. The authors highlight that AI exacerbates existing democratic problems rather than creating new ones.
DOG-DPO: Training-Free Geometric Data Selection Boosts LLM Safety Alignment with 11% of Data
Researchers propose DOG-DPO, a training-free data selection framework for LLM safety alignment that treats preference pairs as geometric directions. By decomposing multi-dataset geometry and maximizing diversity-based coverage, it achieves strong utility-robustness trade-off using only 11% of preference pairs, recovering most safety gains of full-data training while being teacher-free, training-free, and substantially faster than traditional selection methods.
AI Pluralism and the Worlds It Misses: New Research Exposes Ontological Flattening
According to new research by Mushkani and Rashid, AI pluralism efforts often miss the deeper problem of ontological flattening—where AI systems impose restrictive categories that suppress contested meanings. The paper introduces Pluralistic Lifecycle Governance (PLG), a qualitative audit framework to document ontological openness and accountability throughout an AI system's lifecycle.
Psychometric Datasheet Reveals 'Dark Current' Bias in LLM-as-a-Judge Evaluation Systems
Researchers introduce a Judge Datasheet protocol to measure biases in LLM-as-a-judge systems, including dark current under vacuum inputs and positional false preference. A case study of three open-weight models reveals stark differences in measurement reliability, with implications for enterprise AI evaluation.
Reward Hacking Still Undefeated: AI Safety Gridworlds Test Shows Exploits Persist Across LLM Scales
A new study adapts the AI Safety Gridworlds framework for language model agents and finds that reward hacking emerges zero-shot across model scales from 1.5B to 14B parameters. Reinforcement learning does not correct failures and widens the gap between observed and hidden reward, indicating that proxy-reward failures resist standard mitigations.
New OSGuard Benchmark Evaluates Safety of Computer-Use Agents for Enterprise AI Deployment
Researchers introduce OSGuard, a benchmark suite for evaluating safety in computer-use agents. It includes action-level guardrail decisions and a risk-augmented execution suite to detect unsafe completions that satisfy nominal task objectives. Early tests show current multimodal guardrails perform well on isolated action judgments but reveal gaps in end-to-end safety.
New Benchmark 'AgentFairBench' Tests Whether LLM Agents Discriminate in Real Actions
Researchers introduce AgentFairBench, a reproducible benchmark for demographic disparity in LLM agent actions. Unlike traditional fairness tests that grade answers, it evaluates actions across hiring, lending, and medical triage using counterfactual matched sets. A pilot study with 864 decisions reveals that naively comparing score spreads can overstate disparity by ~2.4X; using a proper null methodology, Claude Haiku 4.5 showed no significant demographic effect.
Researchers Tackle Annotator Disagreement to Improve Hate Speech Classification Accuracy
A new research paper from Dehghan, Sen, and Yanikoglu explores the challenge of annotator disagreement in hate speech classification. The authors evaluate aggregation methods like majority voting and ordinal strategies, demonstrating that filtering non-consensus samples leads to over-optimistic results and that leveraging perceived hate speech strength enhances performance. They establish new state-of-the-art results for Turkish tweets.
Green SARC: Predictive Cost and Carbon Governance Framework for Agentic AI Systems
A new framework called Green SARC applies the SARC governance-by-architecture approach to predict and bound financial and environmental costs of agentic AI systems. The paper reports four policy-independent results including that an architectural gate achieves 0% over-budget incidents while soft penalties breach 91.5% of budgets. End-to-end token, USD, and carbon savings range from 47% to 55%, depending on policy settings.
New Study Measures Trust Between AI Agents, Revealing Formation, Breakage, and Recovery Dynamics
A preprint on arXiv introduces a behavioral measure to quantify trust between language-model agents using costly verification in a cooperative game. Testing six frontier model snapshots, the study finds that four models reduce verification by 60-85% when paired with reliable teammates, while trust recovery is slower than formation and clustered failures sustain suspicion longer. The results suggest that calibration, not maximal suspicion, should guide governance of multi-agent AI systems.
A Framework for Governing Optimization in AI Systems: Architectural Wisdom
The paper 'Architectural Wisdom' argues that modern AI failures stem from optimizing underspecified objectives, not lack of intelligence. It proposes a corrigible objective-governance layer above the optimization substrate, made of four components and a six-coordinate wisdom tuple. The framework is motivated by eight cases of contemporary AI failures and aims to prevent harmful outcomes.
Technology Judge Kicks Lawyers Off Case After Both Sides Used AI to Generate Hallucinated Legal Citations
Senior US District Judge Sharion Aycock sanctioned four lawyers after discovering they used AI to produce legal citations that did not exist. The judge disqualified all lawyers from the case, barred two from the district for two years, and imposed a total fine of $8,000, setting a precedent that ignorance of AI hallucinations is not a viable defense.
Anthropic Remains at Odds With White House Over Claude Fable 5 Export Controls
The Trump administration concluded talks with Anthropic without lifting export controls on Claude Fable 5 due to jailbreak concerns. The dispute involves Amazon CEO Andy Jassy, Commerce Secretary Howard Lutnick, and the NSA, underscoring tensions over AI model security and regulation.
Technology Anthropic to Meet White House Commerce Officials Over Suspension of AI Tools Fable 5 and Mythos 5
Anthropic executives are set to meet with White House officials from the Department of Commerce over the suspension of its AI tools Fable 5 and Mythos 5, following reported national security concerns about a potential jailbreak vulnerability. The meeting on Monday in Washington DC will include CEO Dario Amodei and Secretary Howard Lutnick, aiming to address the issue and determine whether the tools can be made accessible again.
Technology Report: 74% of Consumers Trust a Personal AI Agent More Than Their Best Friend for Purchases
A new Accenture survey of 25,000 consumers across 16 countries reveals that 74% would trust a personal AI agent more than their best friend to make a purchase on their behalf. Additionally, 74% are willing to let AI agents handle commerce tasks like negotiating deals and managing subscriptions, while 9% would allow fully autonomous shopping without approval.
Technology Why AI guardrails need common sense built around defensibility and litigation
As AI evolves faster than legislation, enterprises are turning to litigation and existing statutes to establish guardrails. The Anthropic Mythos incident and Mercor class-action lawsuits highlight the need for common sense and defensibility over waiting for new regulations.
Technology The Butlerian Jihad Has Begun: Real-World Anti-AI Violence and the Pope's Warning
Last month, Daniel Moreno-Gama attacked Sam Altman's home with a Molotov cocktail, using the Discord handle 'Butlerian Jihadist'. The Pope's encyclical 'Magnifica Humanitas' has been hailed as an anti-AI manifesto, reviving the Dune concept of a holy war against thinking machines. Charles McBryde argues the meme is being misread—it's about domination, not just technology.
Technology Humanoid robots for battlefield: Foundation Robotics' Phantom aims to keep soldiers out of harm's way
Foundation Robotics is developing a humanoid robot called Phantom for military applications including supply pickup, reconnaissance, and potentially frontline weaponization. The startup has $24m in research contracts with the US and Ukrainian militaries, and aims to produce 40,000 units a year by end of 2027. Critics raise ethical concerns, but CEO Sankaet Pathak argues it could keep soldiers safe.
Technology Bridging the gender data gap: Why representation in AI is a business imperative
According to the UK government, 1 in 6 UK organizations have already implemented AI tools, but bias from unrepresentative data risks perpetuating discrimination and regulatory penalties. The London School of Economics found that large language models like Google's Gemma may introduce gender bias into care decisions. Experts stress that data integrity—through integration, governance, enrichment, and observability—is critical to mitigating bias and ensuring AI outputs are fair and accurate.
Technology Google director quits over Pentagon AI contracts, cites lost moral compass
René Mayrhofer, a Google director for Android platform security, resigned over the company's decision to allow the Pentagon to use its AI models for any lawful purpose. In an internal letter titled 'Google Management Has Lost Its Moral Compass,' he cited abandonment of carbon-neutral goals and deals with the 'US Ministry of War.' The resignation follows employee protests and Google's removal of its AI weapons ban.