Deep research and agent evolution are two critical tasks for AI agents moving toward artificial general intelligence, but each faces distinct limitations. According to a new paper on arXiv titled 'Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher,' researchers propose the Hybrid Open-Ended Tri-Evolution (HOTE) framework to bridge these tasks through the collaborative evolution of three modules: a proposer, solver, and judge, leveraging web-scale knowledge via hybrid-mode reinforcement learning.
The Challenge of Static Deep Research
Deep research enables AI agents to autonomously retrieve and integrate information in open-ended environments to tackle open-ended research tasks. However, according to the paper, this capability is constrained by the static parametric deep research capabilities of current agent systems. In contrast, agent evolution allows agents to autonomously interact with the environment to gain experiences that evolve model capabilities, but its effectiveness has been widely validated only on verifiable tasks with standard answers, leaving a gap with open-ended research tasks.
The Hybrid Open-Ended Tri-Evolution (HOTE) Framework
The HOTE framework addresses this gap by combining the strengths of both tasks. It employs hybrid-mode reinforcement learning to facilitate the collaborative evolution of a proposer, which generates research directions; a solver, which retrieves and synthesizes information; and a judge, which evaluates the quality of outputs. This tri-evolution process is based on web-scale knowledge, moving toward autonomous evolving agents in open-ended tasks and environments. The authors—Piao Hongming, Liu Chi, Chen Mengzhuo, Shu Yan, Wang Xidong, Derek Wei, Ying Dai, and Bryan—designed the framework to bridge the two critical tasks of deep research and agent evolution.
Experimental Validation
The paper reports extensive experiments on three long-form deep research benchmarks. An 8B parameter model trained via HOTE surpasses the strongest static open 8–32B models, as well as models trained by state-of-the-art deep research training methods, while requiring less time overhead. This result underscores the efficiency of the HOTE approach: a smaller model achieves superior performance through the collaborative evolution of its three modules, rather than relying solely on larger parameter counts.
| Model | Performance | Time Overhead |
|---|---|---|
| HOTE 8B | Outperforms static open 8–32B models and SOTA deep research training methods | Less than competing methods |
| Static open 8–32B models | Outperformed by HOTE 8B | Baseline |
| SOTA deep research training methods | Outperformed by HOTE 8B | Higher than HOTE 8B |
Key Findings
The study further verifies that the evolution of all three modules in HOTE is indispensable. Each component—proposer, solver, and judge—contributes to the overall performance gain. The paper states that the HOTE framework "leverages hybrid-mode reinforcement learning to facilitate the collaborative evolution of a proposer, solver and judge based on web-scale knowledge, moving toward autonomous evolving agents in open-ended tasks and environments." This marks a step toward more capable and efficient AI systems for open-ended research, addressing a key limitation in current agent evolution methods.
The HOTE framework demonstrates that combining deep research and agent evolution through structured collaboration can yield significant improvements even with smaller models. For enterprise technology leaders exploring AI agents for complex research and analysis tasks, this approach offers a potential path to more efficient and autonomous systems that can tackle open-ended problems without the need for massive compute resources.