Artificial Intelligence #scalable circuit learning#large language models
CircuitLasso Enables Scalable Interpretability for Large Language Models at Lower Cost
A new approach called CircuitLasso uses sparse linear regression to learn interpretable circuits in large language models. It achieves structural accuracy comparable to intervention-based methods on benchmark data while dramatically reducing computational cost. The method also reveals relationships among sparse autoencoder features, aiding understanding of how semantic features propagate through models.
Jun 16, 2026 1 source