iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
New Survey Unifies LLM Policy Optimization Methods on First Principles from REINFORCE to GRPO Neuro-Symbolic Framework Improves Motion Prediction for Autonomous Vehicles in Mixed Traffic AI Scientist Automates Entire Research Lifecycle, Passes First Peer Review AI-driven Landmark-free Assessment of Lower-limb Alignment with Implicit Neural Shape Functions from Knee Radiographs Quantum Machine Learning for Industrial Applications: New Research Tackles Trainability and Expressivity New Method Resolves Drift Attribution Ambiguity in LLM Evaluation Pipelines New Hardware-Aware Neural Architecture Search Runs on Embedded Devices with Under 512MB RAM Malaysia's AI Agent-Powered Messaging Platform Respond.io Raises $62.5M, Targets Acquisitions MimicIK Framework Achieves Real-Time Inverse Kinematics with 4.65 mm Accuracy for Robotic Teleoperation Reward Hacking Still Undefeated: AI Safety Gridworlds Test Shows Exploits Persist Across LLM Scales New Survey Unifies LLM Policy Optimization Methods on First Principles from REINFORCE to GRPO Neuro-Symbolic Framework Improves Motion Prediction for Autonomous Vehicles in Mixed Traffic AI Scientist Automates Entire Research Lifecycle, Passes First Peer Review AI-driven Landmark-free Assessment of Lower-limb Alignment with Implicit Neural Shape Functions from Knee Radiographs Quantum Machine Learning for Industrial Applications: New Research Tackles Trainability and Expressivity New Method Resolves Drift Attribution Ambiguity in LLM Evaluation Pipelines New Hardware-Aware Neural Architecture Search Runs on Embedded Devices with Under 512MB RAM Malaysia's AI Agent-Powered Messaging Platform Respond.io Raises $62.5M, Targets Acquisitions MimicIK Framework Achieves Real-Time Inverse Kinematics with 4.65 mm Accuracy for Robotic Teleoperation Reward Hacking Still Undefeated: AI Safety Gridworlds Test Shows Exploits Persist Across LLM Scales
Home ›› Technology ›› Ai ›› Llms ›› New ASRD Method Boosts Diffusion LLM Accuracy by 6.4% and Inference Speed by 7.2×

New ASRD Method Boosts Diffusion LLM Accuracy by 6.4% and Inference Speed by 7.2×

Researchers propose ASRD (Anchor Supervised Revocable Decoding), a training-free framework that improves diffusion LLM accuracy by up to 6.4% and accelerates inference throughput by up to 7.2×. ASRD addresses error propagation and local error reinforcement in revocable decoding by introducing anchor tokens and two complementary mechanisms.

iG
iGEN Editorial
June 16, 2026
New ASRD Method Boosts Diffusion LLM Accuracy by 6.4% and Inference Speed by 7.2×

Diffusion Large Language Models (dLLMs) promise parallel generation but face a fundamental trade-off between decoding speed and quality. Revocable decoding strategies attempt to mitigate errors by verifying and remasking tokens, but they often operate within a mixed-quality context, leading to two critical failures: Error Propagation, where new tokens absorb toxic information from erroneous context, and Local Error Reinforcement, where errors mutually reinforce each other to evade detection. Researchers have now introduced ASRD (Anchor Supervised Revocable Decoding), a training-free framework that operates within the embedding space to navigate these challenges.

ASRD: A Training-Free Framework in Embedding Space

ASRD explicitly decouples the decoding context into trusted Anchor Tokens, which are identified via temporal consistency, and uncertain candidates. It leverages a dynamic Anchor Tokens Cache to store and reuse anchor information during decoding. According to the paper published on arXiv, this approach requires no additional training, making it easily integrable into existing dLLM pipelines.

Two Complementary Mechanisms

ASRD introduces two mechanisms that work together. First, Anchor-Guided Generation injects entropy-weighted anchor signals into masked positions to implicitly rectify attention toward the reliable global skeleton. Second, Anchor-Perturbed Verification applies orthogonal perturbations to uncertain candidate tokens, destabilizing and remasking errors driven by fragile local consensus. These mechanisms are designed to prevent the reinforcement of errors that often plague revocable decoding.

Performance Gains Measured on Benchmarks

Extensive experiments on math and coding benchmarks demonstrated that ASRD outperforms recent remasking baselines. The results are summarized below:

Metric Improvement
Accuracy (math & coding benchmarks) Up to 6.4% improvement over baselines
Inference throughput Up to 7.2× acceleration

The authors reported that these gains come without additional training overhead, as ASRD is a training-free framework.

Implications for Enterprise AI Inference

For enterprise technology decision-makers, the ability to improve both accuracy and inference speed in LLMs directly impacts operational costs and user experience. A 7.2× throughput improvement could mean significant reductions in hardware requirements or latency for real-time applications. Moreover, the training-free nature of ASRD means it can be adopted without the time and expense of model retraining. While the paper focuses on math and coding tasks, the underlying principles of anchor token identification and error decoupling could generalize to other domains where parallel decoding is beneficial.

As organizations increasingly deploy large language models for critical business processes, techniques like ASRD that address the reliability-speed trade-off become essential. The research, authored by Yao, Yizhen, Zhu, Qinglin, Zhao, Runcong, Dai, Xiangxiang, Yanzheng, He, Yulan, Gui, and Lin, is available on arXiv and represents a step toward more robust and efficient dLLM inference.


Sources:

Keep Reading

Recommended Stories

Latent Thought Flow: Efficient Reasoning in LLMs Cuts Cost and Boosts Accuracy Technology

Latent Thought Flow: Efficient Reasoning in LLMs Cuts Cost and Boosts Accuracy

Researchers propose Latent Thought Flow (LTF), a method that models LLM reasoning as continuous trajectories in latent space, using GFlowNet and entropy-weighted objectives. LTF outperforms explicit Chain-of-Thought and latent reasoning baselines, achieving 9.5% higher accuracy while cutting reasoning length by 27.2%, addressing the linguistic bottleneck that inflates inference costs.

June 16, 2026
AgentLeak Benchmark Reveals Internal Channel Privacy Leaks in Multi-Agent LLM Systems Technology

AgentLeak Benchmark Reveals Internal Channel Privacy Leaks in Multi-Agent LLM Systems

A new benchmark called AgentLeak evaluates privacy leakage in multi-agent large language model (LLM) systems, finding that inter-agent messages leak at 68.8% compared to 27.2% for final outputs. Across 1,000 scenarios and five models, total system exposure reaches 68.9%, highlighting risks invisible to standard output-only audits.

June 16, 2026
New Definition of Good Explanations Highlights Challenges in Explaining LLM Outputs Technology

New Definition of Good Explanations Highlights Challenges in Explaining LLM Outputs

A recent arXiv paper by Mahon, Louis, Ford, Elliot, Hackett, and Callum proposes a definition of good explanations inspired by counterfactual explanations but incorporating the interlocutor's prior beliefs. The authors explore the ramifications for AI explainability, particularly why LLM outputs are difficult to explain well.

June 16, 2026
New Survey Unifies LLM Policy Optimization Methods on First Principles from REINFORCE to GRPO Technology

New Survey Unifies LLM Policy Optimization Methods on First Principles from REINFORCE to GRPO

A new survey on arXiv revisits LLM policy optimization from first principles, modeling all methods as modifications of either the trajectory probability or reward function. It covers the path from REINFORCE to GRPO and beyond, identifying compound failures that require joint design of both sides.

June 16, 2026