Topic
interpretability
New DAG-SHAP Method Improves Feature Attribution Using Edge Intervention in Directed Acyclic Graphs
Researchers introduce DAG-SHAP, a feature attribution method for directed acyclic graphs that uses edge intervention to address limitations of node-centric Shapley value approaches. The method captures both externality and exogenous influence, validated on real and synthetic datasets.
New Orthogonal Projection Method Reduces Hallucinations in Vision-Language AI Explanations
Researchers propose Orthogonal Semantic Projection (OSP), a geometric intervention that reduces semantic hallucination in Vision-Language Model explanations. The method orthogonalizes query vectors against distractor concepts, improving attribution fidelity for safety-critical AI applications.
New Definition of Good Explanations Highlights Challenges in Explaining LLM Outputs
A recent arXiv paper by Mahon, Louis, Ford, Elliot, Hackett, and Callum proposes a definition of good explanations inspired by counterfactual explanations but incorporating the interlocutor's prior beliefs. The authors explore the ramifications for AI explainability, particularly why LLM outputs are difficult to explain well.