New Research Demystifies Variance in Circuit Discovery of Large Language Models

A new research paper explores variance in circuit discovery of large language models, identifying resampling, rephrasing, and sample-wise variance. The authors propose CEAP, an improved method over EAP-IG with theoretical guarantees, and argue that rephrasing variance makes it hard to find comprehensive circuits, suggesting LLMs may be inherently difficult to steer.

iGEN Editorial

June 16, 2026

New Research Demystifies Variance in Circuit Discovery of Large Language Models

Circuit discovery, a key technique in mechanistic interpretability, aims to pinpoint the model components crucial for performing a given task in large language models (LLMs). However, substantial variability in the discovered circuits has raised concerns about reliability. A new paper, "Demystifying Variance in Circuit Discovery of LLMs" by Wu, Tonin, and Cevher, published on arXiv, systematically examines three types of variance and proposes a new method to mitigate one of them.

The current state-of-the-art method, EAP-IG, performs well on the metric of (un)faithfulness but suffers from substantial variability. The authors identify three distinct categories of variance:

Resampling variance: The circuit changes when probing with a new batch of data from the same distribution.
Rephrasing variance: The discovered circuit shifts when the prompts are rephrased.
Sample-wise variance: A circuit with low population unfaithfulness exhibits large fluctuations in unfaithfulness across individual samples.

CEAP: A New Method with Theoretical Guarantees

To address resampling variance, the researchers introduce CEAP, an improvement on EAP-IG that includes a theoretical guarantee. According to the paper, CEAP can substantially lessen resampling variance. The method's enhanced stability makes it more reliable for identifying important components across different data samples.

The Challenge of Rephrasing Variance

Rephrasing variance arises because prompts with different templates tend to activate different circuits in the model. The authors argue that this makes it challenging to find a comprehensive circuit that explains and controls the model's behavior on a task expressed in countless templates. They suggest that this phenomenon indicates LLMs may be inherently hard to steer. Interestingly, the paper notes that sparsity, which has been claimed to form more compact and interpretable task circuits, fails to solve this problem.

Sample-Wise Variance: Mostly Benign

Regarding sample-wise variance, the authors argue it is largely benign. Extremely poor unfaithfulness scores often stem from how unfaithfulness is defined rather than from defects in the measured circuits. They show that the magnitude of unfaithfulness is affected by selective contribution scaling, a neural mechanism that accounts for the extremely poor scores sometimes observed.

Variance Type	Definition	Key Insight
Resampling variance	Circuit changes with new data batches from same distribution	CEAP method reduces this variance
Rephrasing variance	Circuit shifts when prompts rephrased	Suggests LLMs may be inherently hard to steer; sparsity doesn't help
Sample-wise variance	Unfaithfulness fluctuations across individual samples	Mostly benign; poor scores due to definition, not circuit defects

For enterprise technology decision-makers, this research underscores the importance of understanding the limitations of current interpretability methods when deploying LLMs in production environments. While circuit discovery can pinpoint relevant components, variance across rephrasings and data samples means that a single discovered circuit may not reliably represent model behavior for all inputs. The CEAP method offers a step forward in reducing resampling variance, but the fundamental challenge of rephrasing variance suggests that steering LLMs with high reliability remains an open problem.

Sources:

New Research Demystifies Variance in Circuit Discovery of Large Language Models

CEAP: A New Method with Theoretical Guarantees

The Challenge of Rephrasing Variance

Sample-Wise Variance: Mostly Benign

Recommended Stories

Yann LeCun's new AI startup AMI Labs raises $1bn to build flexible intelligence beyond LLMs

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

Transformer Feed-Forward Block Linearity: Learned, Not Architectural, According to New Research

Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs