Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation

Researchers have introduced TEND, the first execution-verified benchmark for Text-to-NoSQL translation, comprising 1,210 MongoDB-native tasks. They also propose SAG, a Schema-as-Data Grounding solver, to improve query generation for schema-less document stores. Experiments show that LLMs strong at NL2SQL struggle on TEND, validating Text-to-NoSQL as a distinct problem.

iGEN Editorial

June 16, 2026

Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation

For enterprises relying on NoSQL databases as their core data infrastructure, the ability to query them using natural language remains underdeveloped. A new research paper from arXiv presents TEND (Text-to-NoSQL Dataset) and SAG (Schema-as-Data Grounding), aiming to bridge this gap for MongoDB aggregation pipelines over schema-less document stores.

According to the paper, correct query generation must recover how a non-relational data model represents entities, nested paths, arrays, missing fields, and dynamic keys. This challenge is more complex than traditional SQL querying because NoSQL databases like MongoDB store data without a fixed schema.

The Challenge of Schema-less Document Stores

NoSQL databases are widely used for their flexibility, but natural-language access to them remains underdeveloped. The authors, including Lu, Jinwei, Jiawei, Zhang, Chen, Qin, Zhiqian, Haodi, Song, Yuanfeng, Wong, and Raymond Chi-Wing, note that translating natural language requests into executable NoSQL queries requires understanding how entities and relationships are encoded in non-relational models. For example, a query must handle nested arrays, optional and sparse paths, and polymorphic shapes—features not present in relational databases.

TEND: An Execution-Verified Benchmark

The paper presents TEND, an execution-verified benchmark with 1,210 MongoDB-native tasks across 11 databases. To the authors' knowledge, TEND is the first Text-to-NoSQL benchmark whose database worlds are MongoDB-native by design. Experts manually defined collection boundaries, nested arrays, optional and sparse paths, polymorphic shapes, and dynamic-key conventions. The worlds are populated with real data and verified through frozen MongoDB execution. This ensures that TEND evaluates schema-less document reasoning rather than SQL-to-MQL transfer.

SAG: Schema-as-Data Grounding Solver

The authors further introduce SAG, a Schema-as-Data Grounding solver. SAG induces path and value grounding from stored-document evidence before bounded MQL generation, followed by execution-grounded repair and result-consistency selection. Evaluation uses bounded column-tolerant execution accuracy (EXC) as the headline metric, complemented by a graded result-set F1 and a mutually exclusive execution-outcome decomposition.

Implications for AI-Powered Data Access

Experiments demonstrate that large language models (LLMs) with strong NL2SQL performance degrade substantially on TEND, validating Text-to-NoSQL as a distinct schema-less document reasoning problem. This finding highlights the need for specialized approaches when applying natural language interfaces to NoSQL databases. For enterprises, this research points to a future where complex querying of diverse data stores becomes more accessible, but significant work remains to match the maturity of SQL-based solutions.

Aspect	NL2SQL (Relational)	Text-to-NoSQL (Document)
Schema	Fixed, known schema	Schema-less, dynamic keys
Data model	Tables and joins	Nested arrays, optional paths
Query generation	Mature benchmarks	First benchmark (TEND)
LLM performance	Strong	Substantially degrades

The paper is available on arXiv and represents a foundational step for enabling natural language querying of NoSQL systems, a critical capability for data-driven enterprises managing diverse and flexible data architectures.

Sources:

Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation

The Challenge of Schema-less Document Stores

TEND: An Execution-Verified Benchmark

SAG: Schema-as-Data Grounding Solver

Implications for AI-Powered Data Access

Recommended Stories

Beyond Text-to-SQL: New Agentic LLM System Governs Enterprise Analytics APIs

Tree-like Self-Play Framework Teaches LLMs to Fix Security Flaws in Code Generation

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

Study Finds Persistent Cooperative Bias in Next-Gen LLM Agents but Significant Provider Divergence