auditing

4 stories

Artificial Intelligence #aura#adaptive uncertainty-aware refinement

AURA: Adaptive Uncertainty-Aware Refinement Framework for Auditing LLM-as-a-Judge Decisions

A new framework named AURA (Adaptive Uncertainty-Aware Refinement) addresses the challenge of auditing large language models when used as judges for open-ended generation. It iteratively learns a human-consistency signal, propagates reliable evidence, and prioritizes uncertain comparisons for human review. The approach treats trust in a judge as a latent quantity that is progressively refined as evidence accumulates.

Jul 8, 2026 1 source

AuAu Benchmark Audits Authoritarian Alignment in Large Language Models from Four Regions

Technology

Artificial Intelligence #benchmark#auditing

AuAu Benchmark Audits Authoritarian Alignment in Large Language Models from Four Regions

Researchers introduce AuAu, a benchmark to assess authoritarian alignment in LLMs using psychometric tests, vignettes, and user prompts. Testing 17 models from China, EU, Russia, and USA revealed substantial authoritarian response rates and easy manipulation via system prompts.

Jun 16, 2026 1 source

New Auditing Framework Detects Synthetic Data Privacy Leaks Without Model Access

Technology

Artificial Intelligence #synthetic data#auditing

New Auditing Framework Detects Synthetic Data Privacy Leaks Without Model Access

A new causal framework for auditing synthetic data detects privacy leaks by distinguishing true disclosures from phantom ones. It uses statistical hypothesis testing with holdout sets, requires no model access or canary insertion, and is orders of magnitude more efficient than shadow-model approaches.

Jun 16, 2026 1 source

Auditing Reward Hackability in Code RL Training Environments Reveals 28.5% Weak Test Suites

Technology

Artificial Intelligence #auditing#reward hackability

Auditing Reward Hackability in Code RL Training Environments Reveals 28.5% Weak Test Suites

A research paper by Rajan on arXiv measures reward hackability in code reinforcement learning (RL) training environments. On a 49-task sample of SWE-bench Verified, 28.5% of tasks have test suites weak enough that a Docker-verified incorrect patch passes them. The study also proposes a hardening procedure using an LLM judge and Docker gate to detect defects.

Jun 16, 2026 1 source