Enterprises managing large document repositories face a fundamental trade-off: retrieval systems like BM25 or ColBERT scale well but expose only ranked results, limiting granular verification and cross-document analysis. Direct Corpus Interaction (DCI) overcomes this by enabling shell-level operations on the full corpus, but becomes slow and unstable as corpus size grows. A new research paper from arXiv introduces Dr-DCI (Retriever-Steered Direct Corpus Interaction), a framework that combines the broad recall of retrieval with the precision of direct manipulation, without sacrificing scalability.
The Challenge of Large-Corpus Agentic Search
Agentic search systems—autonomous agents that retrieve and reason over documents—rely on retriever-mediated interfaces for scalable candidate discovery. According to the paper, these interfaces expose evidence only as ranked results or bounded document views, limiting an agent's ability to reorganize material and verify constraints across documents. DCI addresses this by exposing shell-executable corpus operations for flexible search, filtering, comparison, and verification. However, full-corpus terminal commands degrade in performance and efficiency as the corpus grows. The paper notes that raw DCI becomes "slow and unstable" at scale.
How Dr-DCI Works
Dr-DCI treats retrieval as an agent-callable action for expanding a local workspace. Rather than operating directly over the full corpus, the agent dynamically pulls relevant documents into an evolving workspace and conducts DCI operations within it. This design combines retriever-level recall with DCI-style precision: retrieval keeps exploration scalable, while DCI preserves the local operations needed for effective evidence resolution. The framework thus balances the strengths of both approaches.
Experimental Results
The paper reports experiments across multiple benchmarks. On Browsecomp-Plus, Dr-DCI reaches 71.2% accuracy, improving over raw DCI and ablated variants by up to 8.3 percentage points while reducing tool usage, wall time, and estimated cost. With a workspace-preserving context reset, accuracy further improves to 73.3%. The following table summarizes key results:
| Method | Accuracy on Browsecomp-Plus | Notes |
|---|---|---|
| Raw DCI (baseline) | — | Unablated variant |
| Dr-DCI (standard) | 71.2% | Up to 8.3 points improvement over raw DCI |
| Dr-DCI (context reset) | 73.3% | Workspace-preserving context reset |
In corpus-scaling experiments, Dr-DCI remained effective from 100K to 10 million documents, whereas raw DCI became unstable and BM25 performed substantially worse. Dr-DCI also scaled to a 20-million-scale file-per-document setting (Wiki-18 QA), achieving an average score of 63.0 across six benchmarks, outperforming retrieval-based and trained search-agent baselines.
Key Components and Ablation Insights
Ablation analysis revealed that ranked previews and inter-document DCI are key to performance. Ranked previews provide the agent with a concise list of relevant excerpts, while inter-document DCI enables comparisons and verification across multiple documents. Removing either component significantly degraded accuracy, confirming their importance in the framework's design.
For technology leaders evaluating AI-powered search for enterprise knowledge management, Dr-DCI offers a practical path to more accurate and reliable agentic search without sacrificing efficiency. The framework demonstrates that combining retrieval-level recall with direct corpus operations can effectively scale to tens of millions of documents, a capability increasingly critical for industries dealing with large regulatory, technical, or legal repositories.