New Definition of Good Explanations Highlights Challenges in Explaining LLM Outputs

A recent arXiv paper by Mahon, Louis, Ford, Elliot, Hackett, and Callum proposes a definition of good explanations inspired by counterfactual explanations but incorporating the interlocutor's prior beliefs. The authors explore the ramifications for AI explainability, particularly why LLM outputs are difficult to explain well.

iGEN Editorial

June 16, 2026

New Definition of Good Explanations Highlights Challenges in Explaining LLM Outputs

Enterprise technology buyers increasingly demand explainability from AI systems, yet a clear standard for what constitutes a good explanation remains elusive. A new paper on arXiv, published June 12, 2026, by researchers Mahon, Louis, Ford, Elliot, Hackett, and Callum, tackles this gap by proposing a formal definition of good explanations and applying it to the unique challenges of large language models (LLMs).

The Problem of Explainability in AI

The paper notes that explainability is "crucial for AI adoption in many contexts." However, without an agreed-upon definition of what makes an explanation good, efforts to build explainable AI systems lack a benchmark. The authors argue that existing approaches often overlook the human element.

A New Definition of Good Explanations

The researchers build on the concept of counterfactual explanations — explanations that describe how changing certain inputs would alter an output. According to the paper, a good explanation must also account for "the interlocutor's prior beliefs in each fact that could be offered in an explanation." This means an explanation is effective only if it bridges the gap between what the listener already knows and the reasoning behind the AI's decision.

Challenges Specific to Large Language Models

The paper explores the ramifications of this definition for AI explainability and, in particular, "why LLM outputs are difficult to produce good explanations for." LLMs generate text based on vast, opaque internal representations, making it hard to trace outputs back to specific inputs or training data. The authors' definition suggests that explaining an LLM's output requires understanding the user's prior knowledge, which varies widely and dynamically.

Implications for Enterprise Adoption

For CTOs and technology procurement leaders, the research underscores a fundamental tension: LLMs offer powerful capabilities but resist transparent reasoning. The paper implies that current explainability tools may fall short because they do not model the recipient's beliefs. Enterprise buyers should evaluate AI vendors not only on model accuracy but also on the quality of explanations they can provide, particularly for high-stakes supply chain or trade finance decisions.

The authors' work contributes a philosophical foundation that could guide future tools. As AI permeates customs technology and logistics automation, the ability to produce genuinely good explanations — ones that align with the decision-maker's mental model — will become a competitive differentiator. The paper is available on arXiv under a Creative Commons license and has been shared on platforms like Reddit and BibSonomy.

Sources:

New Definition of Good Explanations Highlights Challenges in Explaining LLM Outputs

The Problem of Explainability in AI

A New Definition of Good Explanations

Challenges Specific to Large Language Models

Implications for Enterprise Adoption

Recommended Stories

Reinforcement-Aware Knowledge Distillation Boosts LLM Reasoning Efficiency

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

AgentLeak Benchmark Reveals Internal Channel Privacy Leaks in Multi-Agent LLM Systems

New Research Shows Pretraining Data Composition Can Engineer Neural Scaling Laws for Particle Physics