iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
AC-ODM: Actor-Critic Online Data Mixing for Sample-Efficient LLM Pretraining – A New Reinforcement Learning Approach New Diagnostic for Language-Driven Bandits Determines When Lightweight Models Beat LLMs Attention as Coupling: New Fast-Slow ODE Framework Aims to Improve Transformer Efficiency Self-Consistency Reranking Boosts Accuracy in Narrative Question Answering for Enterprise AI FRA Greenlights Expanded Rail Track Tech Tests as CSX Prepares July 2026 Rollout Hidden Failure Modes in AI Reasoning: Study Reveals Oversight Paradox and Context-Injection Vulnerabilities InstantForget: New Update-Free Backdoor Unlearning Method Uses Inference-Time Feature Reset for AI Security Beyond Weights and Gradients: New Taxonomy Classifies Federated Learning Messages into Three Categories Token Reduction in Generative Models Must Evolve Beyond Efficiency, New Research Argues Semantic Flip: Synthetic OOD Generation for Robust Refusal in Embodied Question Answering and Spatial Localization AC-ODM: Actor-Critic Online Data Mixing for Sample-Efficient LLM Pretraining – A New Reinforcement Learning Approach New Diagnostic for Language-Driven Bandits Determines When Lightweight Models Beat LLMs Attention as Coupling: New Fast-Slow ODE Framework Aims to Improve Transformer Efficiency Self-Consistency Reranking Boosts Accuracy in Narrative Question Answering for Enterprise AI FRA Greenlights Expanded Rail Track Tech Tests as CSX Prepares July 2026 Rollout Hidden Failure Modes in AI Reasoning: Study Reveals Oversight Paradox and Context-Injection Vulnerabilities InstantForget: New Update-Free Backdoor Unlearning Method Uses Inference-Time Feature Reset for AI Security Beyond Weights and Gradients: New Taxonomy Classifies Federated Learning Messages into Three Categories Token Reduction in Generative Models Must Evolve Beyond Efficiency, New Research Argues Semantic Flip: Synthetic OOD Generation for Robust Refusal in Embodied Question Answering and Spatial Localization
Home ›› Technology ›› Ai ›› Llms ›› New Research Demystifies Variance in Circuit Discovery of Large Language Models

New Research Demystifies Variance in Circuit Discovery of Large Language Models

A new research paper explores variance in circuit discovery of large language models, identifying resampling, rephrasing, and sample-wise variance. The authors propose CEAP, an improved method over EAP-IG with theoretical guarantees, and argue that rephrasing variance makes it hard to find comprehensive circuits, suggesting LLMs may be inherently difficult to steer.

iG
iGEN Editorial
June 16, 2026
New Research Demystifies Variance in Circuit Discovery of Large Language Models

Circuit discovery, a key technique in mechanistic interpretability, aims to pinpoint the model components crucial for performing a given task in large language models (LLMs). However, substantial variability in the discovered circuits has raised concerns about reliability. A new paper, "Demystifying Variance in Circuit Discovery of LLMs" by Wu, Tonin, and Cevher, published on arXiv, systematically examines three types of variance and proposes a new method to mitigate one of them.

The current state-of-the-art method, EAP-IG, performs well on the metric of (un)faithfulness but suffers from substantial variability. The authors identify three distinct categories of variance:

  • Resampling variance: The circuit changes when probing with a new batch of data from the same distribution.
  • Rephrasing variance: The discovered circuit shifts when the prompts are rephrased.
  • Sample-wise variance: A circuit with low population unfaithfulness exhibits large fluctuations in unfaithfulness across individual samples.

CEAP: A New Method with Theoretical Guarantees

To address resampling variance, the researchers introduce CEAP, an improvement on EAP-IG that includes a theoretical guarantee. According to the paper, CEAP can substantially lessen resampling variance. The method's enhanced stability makes it more reliable for identifying important components across different data samples.

The Challenge of Rephrasing Variance

Rephrasing variance arises because prompts with different templates tend to activate different circuits in the model. The authors argue that this makes it challenging to find a comprehensive circuit that explains and controls the model's behavior on a task expressed in countless templates. They suggest that this phenomenon indicates LLMs may be inherently hard to steer. Interestingly, the paper notes that sparsity, which has been claimed to form more compact and interpretable task circuits, fails to solve this problem.

Sample-Wise Variance: Mostly Benign

Regarding sample-wise variance, the authors argue it is largely benign. Extremely poor unfaithfulness scores often stem from how unfaithfulness is defined rather than from defects in the measured circuits. They show that the magnitude of unfaithfulness is affected by selective contribution scaling, a neural mechanism that accounts for the extremely poor scores sometimes observed.

Variance Type Definition Key Insight
Resampling variance Circuit changes with new data batches from same distribution CEAP method reduces this variance
Rephrasing variance Circuit shifts when prompts rephrased Suggests LLMs may be inherently hard to steer; sparsity doesn't help
Sample-wise variance Unfaithfulness fluctuations across individual samples Mostly benign; poor scores due to definition, not circuit defects

For enterprise technology decision-makers, this research underscores the importance of understanding the limitations of current interpretability methods when deploying LLMs in production environments. While circuit discovery can pinpoint relevant components, variance across rephrasings and data samples means that a single discovered circuit may not reliably represent model behavior for all inputs. The CEAP method offers a step forward in reducing resampling variance, but the fundamental challenge of rephrasing variance suggests that steering LLMs with high reliability remains an open problem.


Sources:

Keep Reading

Recommended Stories

Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation Technology

Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation

Researchers propose Z-Plane Neural Networks, which replace traditional ReLU activations and LayerNorm with a bounded geometric activation called Radial Bounding. This new approach maintains 1-Lipschitz continuity, prevents gradient vanishing, and preserves directional information. A 100-layer Z-Plane MLP achieved 98.34% accuracy on MNIST without any ReLU or LayerNorm, demonstrating numerical stability.

June 16, 2026
New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders Technology

New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders

A new research paper proposes Drift-RAE, a method for distilling pretrained flow models in representation autoencoder latent spaces. It overcomes anisotropy and large curvature challenges, achieving 1.77 FID on ImageNet 256 with only 10,000 distillation steps, outperforming existing RAE distillation methods.

June 16, 2026
Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half Technology

Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half

A new arXiv paper from Jaggi proposes Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers. Pretraining experiments show memory footprint reduction by almost 2x with virtually no degradation in perplexity or downstream quality, evaluated on OLMoE, Qwen3, and DeepSeek-style architectures.

June 16, 2026
New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks Technology

New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks

Researchers introduce the Gradient-based Recurrent In-context Learner (GRIL), a linear recurrent network architecture with windowed cross-product self-attention that can implement minibatch gradient descent on a task-specific predictor in a single forward pass. The design achieves strong performance on synthetic in-context learning tasks, Long Range Arena, and language modeling.

June 16, 2026