compression

6 stories

Artificial Intelligence #llm#compression

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

A new arXiv paper presents methods for compressing LLM-generated text, achieving over 100x reduction in data transfer compared to prior techniques. Lossless compression via domain-adapted LoRA adapters doubles efficiency, while an interactive Question-Asking protocol recovers up to 72% of the capability gap between small and large models using only 10 binary questions.

Jun 16, 2026 1 source

DCP-Prune: New Token Pruning Method Preserves AI Model Performance at Ultra-Low Budgets

Technology

Artificial Intelligence #token pruning#llm

DCP-Prune: New Token Pruning Method Preserves AI Model Performance at Ultra-Low Budgets

Researchers propose DCP-Prune, a two-stage token pruning framework that maintains model accuracy even under ultra-low token budgets. The method retains 92.1% of upper-bound average performance on LLaVA-1.5-7B with just 16 visual tokens, addressing distribution shift issues that plague aggressive pruning.

Jun 16, 2026 1 source

Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection for Tool-Using LLM Agents

Technology

Artificial Intelligence #llms#artificial intelligence

Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection for Tool-Using LLM Agents

A new framework called the Counterfactual-Inspired Context Layer (CICL) helps LLM agents select and compress context based on decision relevance rather than semantic similarity. In tests on 50 SWE-bench Verified instances, CICL improved hit@1 from 0.58 to 0.78 and saved 44.93 tokens per query through memory cards.

Jun 16, 2026 1 source

Learned Image Compression Framework SPARC Boosts VLA Robot Control Performance in Bandwidth-Limited Settings

Technology

Artificial Intelligence #learned image compression#vision-language-action models

Learned Image Compression Framework SPARC Boosts VLA Robot Control Performance in Bandwidth-Limited Settings

Researchers introduce SPARC (SPatially Adaptive Rate Control), a learned image compression framework tailored for vision-language-action (VLA) models. SPARC adaptively allocates bitrate based on task relevance and uses a tilted rate loss to preserve critical visual patterns. Experiments on robotic benchmarks RoboCasa365, VLABench, and LIBERO show SPARC achieves stronger control performance than conventional codecs at the same bitrate, with real-world benefits for remote robot control.

Jun 16, 2026 1 source

Lossy Compression Slashes Storage 39x for Neural Surrogate Models, Study Finds

Technology

Artificial Intelligence #lossy compression#neural networks

Lossy Compression Slashes Storage 39x for Neural Surrogate Models, Study Finds

A new study quantifies the impact of lossy compression on neural generative surrogate models, finding that storage can be reduced by up to 39x and training time by up to 3x with negligible effect on model quality, offering a path to more efficient AI training in data-intensive domains.

Jun 16, 2026 1 source

PolyKV: Layer-Wise KV Cache Compression Boosts LLM Inference Efficiency by Up to 54.5%

Technology

Artificial Intelligence #kv cache#compression

PolyKV: Layer-Wise KV Cache Compression Boosts LLM Inference Efficiency by Up to 54.5%

PolyKV is a new framework for compressing the key-value cache in large language model inference. It selects a compression policy per transformer layer and allocates non-uniform cache budgets, outperforming uniform approaches. On LongBench tasks, PolyKV recovers 40%-54.5% of the performance gap between the strongest single-policy baseline and full KV cache.

Jun 16, 2026 1 source