Large language model (LLM)-based agents are evolving from passive text generators into autonomous systems capable of planning, tool use, retrieval, memory access, environmental interaction, and multi-agent collaboration. According to a comprehensive survey published on arXiv, these expanded capabilities make agent behavior harder to verify, debug, and audit. Final-answer accuracy alone cannot explain how an output was produced, which evidence supported each claim, whether tool calls were justified, how memory influenced later decisions, or where failures originated. The survey, titled "From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents," examines evidence tracing and execution provenance as foundations for process-level accountability in trustworthy LLM agents.
Defining Execution Provenance and Evidence Tracing
The survey defines execution provenance as the typed graph of an agent execution and evidence tracing as its projection onto evidence-support relations. This perspective, according to the authors, connects retrieval grounding, claim support, tool-use safety, memory lineage, observability, debugging, audit, and recovery within a unified framework.
A Unified Taxonomy for Trustworthy Agents
The survey introduces a taxonomy covering:
- Trace sources
- Evidence and execution units
- Provenance relations
- Tracing granularity and timing
- Representation forms
- Trust functions
This taxonomy provides a structured way to categorize and compare different approaches to agent transparency and accountability.
Methodological Directions in Provenance Research
The authors review key methodological directions, including:
- Provenance representation – how to encode the execution graph
- Evidence attribution – linking claims back to specific evidence
- Tool-use provenance – tracking which tool calls were made and why
- Runtime guardrails – preventing unsafe actions
- Provenance-bearing memory – memory that retains its own source context
- Observability – enabling real-time monitoring of agent internals
- Failure diagnosis – identifying where and why errors occurred
These directions, the survey states, are critical for building provenance-aware, auditable, and recoverable agent systems.
Open Challenges and Future Work
The survey also discusses benchmarks, datasets, metrics, and open challenges. For enterprise technology leaders evaluating LLM agents for critical applications, these findings underscore the need for systems that can provide not just answers but auditable traces of how those answers were derived. Without such capabilities, autonomous agents risk being deployed in high-stakes environments without the transparency required for trust and compliance.