Approximate nearest-neighbour (ANN) search underpins large-scale retrieval systems and retrieval-augmented generation (RAG), yet its methods across communities rarely cross-reference one another. A new paper on arXiv by researcher Sean Moran, titled 'Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era', argues these methods form a single field governed by three design choices. The paper introduces the projection-quantisation-organisation lens and tests it with a reproducible measurement suite called the BitBudget benchmark, released as open source.
A Unified Framework for Compact-Code Search
The lens categorises every ANN method by how it places its projections, where it sets quantisation thresholds, and how it organises resulting codes for search. This framework spans classical random projections through modern learned embeddings used in RAG systems. The paper explicitly recasts semantic identifiers of generative retrieval as quantisation codes, bridging historically separate research streams.
Key Findings from the BitBudget Benchmark
The benchmark reports three principal findings:
- Quantisation delivers the largest memory savings. A one-bit code with full-precision re-ranking matches uncompressed quality for six of seven embedders tested. The scanned code occupies one thirty-second of the float's size — a 97% reduction.
- Orderings anticipated by the lens recur as embeddings enlarge. Specifically, a learned-embedding regime emerges where binary codes overtake an inverted-file product quantiser at a matched byte budget.
- Supervision dramatically boosts quality. Given class labels, an eight-byte supervised code more than doubles the retrieval quality of the two-kilobyte task-agnostic float it replaces.
| Method | Memory | Retrieval Quality (relative) |
|---|---|---|
| Full-precision float (2 kB) | 2,048 bytes | Baseline (1.0x) |
| One-bit code + re-ranking | 64 bytes (1/32) | Matches baseline on 6/7 embedders |
| Eight-byte supervised code | 8 bytes (1/256) | >2x baseline quality |
Implications for Retrieval-Augmented Generation
For enterprise systems relying on RAG, memory and latency are critical. The finding that one-bit codes can match full-precision accuracy at 1/32 the memory footprint suggests that RAG pipelines could drastically reduce storage costs without sacrificing retrieval precision. The supervised result — where 8 bytes outperform 2,048 bytes — indicates that even modest labelled data can yield outsized gains, making supervised hashing attractive for domain-specific retrieval tasks.
What This Means for Enterprise Adoption
The paper's unified lens simplifies decision-making for technology leaders evaluating ANN solutions. Instead of navigating fragmented literature, the projection-quantisation-organisation framework provides a vocabulary to compare options. The reproducibility of the BitBudget benchmark enables teams to test trade-offs on their own embedders and data. As RAG systems become common in enterprise search, customer support, and knowledge management, this research offers a practical path to cheaper, faster retrieval without quality degradation. The shift from task-agnostic floats to compact supervised codes could unlock new deployment scenarios where memory or bandwidth is constrained.