iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Token Reduction in Generative Models Must Evolve Beyond Efficiency, New Research Argues Semantic Flip: Synthetic OOD Generation for Robust Refusal in Embodied Question Answering and Spatial Localization Emergent Strategic Reasoning Risks in AI: New Taxonomy-Driven Framework Evaluates Deception and Gaming in LLMs Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection Reservoir Attention Network: Cross-Pass State in Pretrained Transformers via Content-Addressable Reservoir Injection Explainable deep learning improves human mental models of self-driving cars, study finds SkillsBench Benchmark Measures How Agent Skills Boost LLM Performance Across Diverse Tasks PATCH Monitor Enables Robots to Handle Unexpected Disturbances During Manipulation Tasks Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation APEC Climate Center Upgrades El Niño to Strong; Indian Monsoon Faces Elevated Risk Token Reduction in Generative Models Must Evolve Beyond Efficiency, New Research Argues Semantic Flip: Synthetic OOD Generation for Robust Refusal in Embodied Question Answering and Spatial Localization Emergent Strategic Reasoning Risks in AI: New Taxonomy-Driven Framework Evaluates Deception and Gaming in LLMs Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection Reservoir Attention Network: Cross-Pass State in Pretrained Transformers via Content-Addressable Reservoir Injection Explainable deep learning improves human mental models of self-driving cars, study finds SkillsBench Benchmark Measures How Agent Skills Boost LLM Performance Across Diverse Tasks PATCH Monitor Enables Robots to Handle Unexpected Disturbances During Manipulation Tasks Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation APEC Climate Center Upgrades El Niño to Strong; Indian Monsoon Faces Elevated Risk
Home ›› Technology ›› Software ›› Dr-DCI: New Framework Combines Retrieval and Direct Corpus Interaction for Scalable Enterprise Search

Dr-DCI: New Framework Combines Retrieval and Direct Corpus Interaction for Scalable Enterprise Search

A new research paper introduces Dr-DCI, a retriever-steered framework that scales direct corpus interaction by dynamically expanding a local workspace. Experiments show accuracy improvements up to 8.3 points over raw DCI, with stable performance from 100K to 10M documents.

iG
iGEN Editorial
June 16, 2026
Dr-DCI: New Framework Combines Retrieval and Direct Corpus Interaction for Scalable Enterprise Search

Enterprises managing large document repositories face a fundamental trade-off: retrieval systems like BM25 or ColBERT scale well but expose only ranked results, limiting granular verification and cross-document analysis. Direct Corpus Interaction (DCI) overcomes this by enabling shell-level operations on the full corpus, but becomes slow and unstable as corpus size grows. A new research paper from arXiv introduces Dr-DCI (Retriever-Steered Direct Corpus Interaction), a framework that combines the broad recall of retrieval with the precision of direct manipulation, without sacrificing scalability.

The Challenge of Large-Corpus Agentic Search

Agentic search systems—autonomous agents that retrieve and reason over documents—rely on retriever-mediated interfaces for scalable candidate discovery. According to the paper, these interfaces expose evidence only as ranked results or bounded document views, limiting an agent's ability to reorganize material and verify constraints across documents. DCI addresses this by exposing shell-executable corpus operations for flexible search, filtering, comparison, and verification. However, full-corpus terminal commands degrade in performance and efficiency as the corpus grows. The paper notes that raw DCI becomes "slow and unstable" at scale.

How Dr-DCI Works

Dr-DCI treats retrieval as an agent-callable action for expanding a local workspace. Rather than operating directly over the full corpus, the agent dynamically pulls relevant documents into an evolving workspace and conducts DCI operations within it. This design combines retriever-level recall with DCI-style precision: retrieval keeps exploration scalable, while DCI preserves the local operations needed for effective evidence resolution. The framework thus balances the strengths of both approaches.

Experimental Results

The paper reports experiments across multiple benchmarks. On Browsecomp-Plus, Dr-DCI reaches 71.2% accuracy, improving over raw DCI and ablated variants by up to 8.3 percentage points while reducing tool usage, wall time, and estimated cost. With a workspace-preserving context reset, accuracy further improves to 73.3%. The following table summarizes key results:

Method Accuracy on Browsecomp-Plus Notes
Raw DCI (baseline) Unablated variant
Dr-DCI (standard) 71.2% Up to 8.3 points improvement over raw DCI
Dr-DCI (context reset) 73.3% Workspace-preserving context reset

In corpus-scaling experiments, Dr-DCI remained effective from 100K to 10 million documents, whereas raw DCI became unstable and BM25 performed substantially worse. Dr-DCI also scaled to a 20-million-scale file-per-document setting (Wiki-18 QA), achieving an average score of 63.0 across six benchmarks, outperforming retrieval-based and trained search-agent baselines.

Key Components and Ablation Insights

Ablation analysis revealed that ranked previews and inter-document DCI are key to performance. Ranked previews provide the agent with a concise list of relevant excerpts, while inter-document DCI enables comparisons and verification across multiple documents. Removing either component significantly degraded accuracy, confirming their importance in the framework's design.

For technology leaders evaluating AI-powered search for enterprise knowledge management, Dr-DCI offers a practical path to more accurate and reliable agentic search without sacrificing efficiency. The framework demonstrates that combining retrieval-level recall with direct corpus operations can effectively scale to tens of millions of documents, a capability increasingly critical for industries dealing with large regulatory, technical, or legal repositories.


Sources:

Keep Reading

Recommended Stories

India Pitches Quantum Computing Collaboration with Russia Under National Mission at BRICS Forum Technology

India Pitches Quantum Computing Collaboration with Russia Under National Mission at BRICS Forum

India has proposed a quantum computing collaboration with Russia under its National Quantum Mission (NQM), as announced by Indian Ambassador to Russia Vinay Kumar at the BRICS Quantum Technologies forum in Moscow. The NQM aims to develop a complete quantum ecosystem, including quantum computers by 2031, with applications in drug discovery, cybersecurity, AI, and climate modeling. India's four IIT technology hubs in Chennai, Mumbai, Delhi, and Bengaluru will drive international cooperation.

June 15, 2026
Ultrasonic espresso machine brews coffee without heat using sound waves, cutting energy by 75% Technology

Ultrasonic espresso machine brews coffee without heat using sound waves, cutting energy by 75%

Researchers at the School of Chemical Engineering in Sydney have created an espresso machine that uses ultrasound instead of heat, brewing coffee in under three minutes at room temperature with 75% less energy. In blind taste tests, 100 coffee drinkers could not distinguish the ultrasonic espresso from traditional hot-brewed espresso. The technology could find applications in home machines, cafes, and ready-to-drink coffee products.

June 15, 2026
Volvo Trucks to Launch Unattended Over-the-Air Updates Technology

Volvo Trucks to Launch Unattended Over-the-Air Updates

Volvo Trucks is set to launch unattended over-the-air software updates, allowing fleets to update vehicles without downtime. This innovation aims to increase productivity by reducing unplanned stops and simplifying update processes.

June 10, 2026
Test Your Google Knowledge with Our Tech Quiz Technology

Test Your Google Knowledge with Our Tech Quiz

Google's expansive tech portfolio extends beyond search and YouTube. Test your knowledge of its products and services with our 15-question quiz. Discover how well you know Google.

June 10, 2026