LLM-powered agentic systems can handle complex long-horizon tasks, but they are typically locked into static configurations set before execution begins. According to a paper published on arXiv, this rigidity forces a trade-off between domain-specific performance and cross-task generalization: strong priors and compact tool spaces aid specialization but weaken transfer, while broad action spaces dilute guidance. The researchers propose ToolSelf, a runtime self-reconfiguration paradigm that abstracts configuration updates as a standardized tool interface, unifying execution and adaptation within a single policy's action space.
LLM-powered agentic systems excel at complex long-horizon tasks, but remain constrained by static configurations fixed before execution.
How ToolSelf Works
ToolSelf treats configuration changes the same way it treats any other tool call. During task execution, the agent can dynamically update its:
- Sub-goals
- Strategies
- Toolboxes
- Context
- Context-management modes
These updates are driven by task progress and feedback, allowing the agent to adapt without human intervention. The paper emphasizes that prior methods—pre-execution optimization, planner-worker orchestration, and configuration patching—fall short because they decouple adaptation from execution, causing information loss and fragmented optimization.
Configuration-Aware Two-stage Training (CAT)
To operationalize self-reconfiguration, the researchers introduce Configuration-Aware Two-stage Training (CAT). This approach combines:
- Rejection sampling fine-tuning
- Trajectory-level KTO reinforcement learning
CAT internalizes the ability to reconfigure, enabling the agent to learn when and how to adjust its configuration based on the current state. The training process is designed to make self-reconfiguration emerge naturally rather than being manually engineered.
Performance Gains
Across diverse benchmarks, ToolSelf demonstrates significant improvements. In zero-shot evaluations, it rivals task-specialized agents. After CAT training, ToolSelf gains an average of 28.8 points over the static-configuration baseline. The results illuminate a path toward emergent adaptivity that obviates manually injected guidance.
| Configuration | Performance |
|---|---|
| Static baseline | Baseline |
| Zero-shot ToolSelf | Rivals task-specialized agents |
| ToolSelf with CAT | +28.8 points vs. baseline |
Implications for Enterprise AI
For enterprise technology decision-makers, ToolSelf suggests a future where AI agents can autonomously adjust their own tool sets and strategies mid-task. This could reduce the overhead of manual configuration tuning in automation pipelines, though the research remains at an academic stage. The paper's authors include Jingqi Zhou, Sheng Wang, Dezhao Deng, Junwen Lu, Junwei Su, Qintong Gao, Jiahui Wu, Hao Wu, Jiyue Jiang, Lingpeng Kong, and Dunhong Chuan. The code is publicly available at the provided link, enabling further experimentation by the AI community.