Dr-DCI: New Framework Combines Retrieval and Direct Corpus Interaction for Scalable Enterprise Search

A new research paper introduces Dr-DCI, a retriever-steered framework that scales direct corpus interaction by dynamically expanding a local workspace. Experiments show accuracy improvements up to 8.3 points over raw DCI, with stable performance from 100K to 10M documents.

iGEN Editorial

June 16, 2026

Dr-DCI: New Framework Combines Retrieval and Direct Corpus Interaction for Scalable Enterprise Search

Enterprises managing large document repositories face a fundamental trade-off: retrieval systems like BM25 or ColBERT scale well but expose only ranked results, limiting granular verification and cross-document analysis. Direct Corpus Interaction (DCI) overcomes this by enabling shell-level operations on the full corpus, but becomes slow and unstable as corpus size grows. A new research paper from arXiv introduces Dr-DCI (Retriever-Steered Direct Corpus Interaction), a framework that combines the broad recall of retrieval with the precision of direct manipulation, without sacrificing scalability.

The Challenge of Large-Corpus Agentic Search

Agentic search systems—autonomous agents that retrieve and reason over documents—rely on retriever-mediated interfaces for scalable candidate discovery. According to the paper, these interfaces expose evidence only as ranked results or bounded document views, limiting an agent's ability to reorganize material and verify constraints across documents. DCI addresses this by exposing shell-executable corpus operations for flexible search, filtering, comparison, and verification. However, full-corpus terminal commands degrade in performance and efficiency as the corpus grows. The paper notes that raw DCI becomes "slow and unstable" at scale.

How Dr-DCI Works

Dr-DCI treats retrieval as an agent-callable action for expanding a local workspace. Rather than operating directly over the full corpus, the agent dynamically pulls relevant documents into an evolving workspace and conducts DCI operations within it. This design combines retriever-level recall with DCI-style precision: retrieval keeps exploration scalable, while DCI preserves the local operations needed for effective evidence resolution. The framework thus balances the strengths of both approaches.

Experimental Results

The paper reports experiments across multiple benchmarks. On Browsecomp-Plus, Dr-DCI reaches 71.2% accuracy, improving over raw DCI and ablated variants by up to 8.3 percentage points while reducing tool usage, wall time, and estimated cost. With a workspace-preserving context reset, accuracy further improves to 73.3%. The following table summarizes key results:

Method	Accuracy on Browsecomp-Plus	Notes
Raw DCI (baseline)	—	Unablated variant
Dr-DCI (standard)	71.2%	Up to 8.3 points improvement over raw DCI
Dr-DCI (context reset)	73.3%	Workspace-preserving context reset

In corpus-scaling experiments, Dr-DCI remained effective from 100K to 10 million documents, whereas raw DCI became unstable and BM25 performed substantially worse. Dr-DCI also scaled to a 20-million-scale file-per-document setting (Wiki-18 QA), achieving an average score of 63.0 across six benchmarks, outperforming retrieval-based and trained search-agent baselines.

Key Components and Ablation Insights

Ablation analysis revealed that ranked previews and inter-document DCI are key to performance. Ranked previews provide the agent with a concise list of relevant excerpts, while inter-document DCI enables comparisons and verification across multiple documents. Removing either component significantly degraded accuracy, confirming their importance in the framework's design.

For technology leaders evaluating AI-powered search for enterprise knowledge management, Dr-DCI offers a practical path to more accurate and reliable agentic search without sacrificing efficiency. The framework demonstrates that combining retrieval-level recall with direct corpus operations can effectively scale to tens of millions of documents, a capability increasingly critical for industries dealing with large regulatory, technical, or legal repositories.

Sources:

Dr-DCI: New Framework Combines Retrieval and Direct Corpus Interaction for Scalable Enterprise Search

The Challenge of Large-Corpus Agentic Search

How Dr-DCI Works

Experimental Results

Key Components and Ablation Insights

Recommended Stories

Scientists Use AI and Quantum Computing to Generate New Peptides in Spare Time

India and Switzerland Step Up Innovation Partnership with Focus on Startups, Research

New AI Model Lets Robots Grasp Objects Like Humans Using RGB-D Data

SorryDB Benchmark Tests AI Provers on Real-World Lean Theorem Completion Tasks