iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Study Finds Persistent Cooperative Bias in Next-Gen LLM Agents but Significant Provider Divergence Snap Launches $2,195 AR Glasses 'Specs' for Consumer Market, Available for Preorder Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling Gated QKAN-FWP: Quantum-Inspired Sequence Learning Achieves Parameter Efficiency on NISQ Devices The Robot Vacuums Cleaning My Three-Story Home for Me New Framework TRACED Evaluates LLM Reasoning Using Geometric Stability and Progress Everllence Lands First Order for Next-Gen Methane Dual-Fuel Engine on Car Carriers How Scale Design Impacts LLM Metacognition and Enterprise AI Reliability GMN4AD: New Graph Matching Network Boosts Alzheimer's Diagnosis Accuracy Using Multi-Center MRI Data Study Finds Persistent Cooperative Bias in Next-Gen LLM Agents but Significant Provider Divergence Snap Launches $2,195 AR Glasses 'Specs' for Consumer Market, Available for Preorder Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling Gated QKAN-FWP: Quantum-Inspired Sequence Learning Achieves Parameter Efficiency on NISQ Devices The Robot Vacuums Cleaning My Three-Story Home for Me New Framework TRACED Evaluates LLM Reasoning Using Geometric Stability and Progress Everllence Lands First Order for Next-Gen Methane Dual-Fuel Engine on Car Carriers How Scale Design Impacts LLM Metacognition and Enterprise AI Reliability GMN4AD: New Graph Matching Network Boosts Alzheimer's Diagnosis Accuracy Using Multi-Center MRI Data
Home ›› Technology ›› Ai ›› Llms ›› Fast-dLLM++ Boosts Diffusion LLM Inference Up to 37% With Fréchet Profile Decoding

Fast-dLLM++ Boosts Diffusion LLM Inference Up to 37% With Fréchet Profile Decoding

Researchers propose Fast-dLLM++, a training-free extension to Fast-dLLM that uses Fréchet profile decoding to select parallel token commit sets from the full confidence profile. Experiments on LLaDA-8B show up to 37% higher throughput at comparable accuracy on benchmarks including GSM8K, MATH, HumanEval, and MBPP.

iG
iGEN Editorial
June 16, 2026
Fast-dLLM++ Boosts Diffusion LLM Inference Up to 37% With Fréchet Profile Decoding

Enterprise adoption of large language models (LLMs) is often constrained by inference latency — the time it takes to generate responses. Diffusion LLMs promise faster generation by producing multiple tokens in parallel, but the decoding step that decides which masked tokens can be committed simultaneously has been a bottleneck. A new research paper from Kasa, Dai, Negi, and Li introduces Fast-dLLM++, a training-free algorithm that improves throughput by up to 37% without sacrificing accuracy.

The Bottleneck in Diffusion LLM Inference

Diffusion LLMs generate text by starting with a fully masked sequence and iteratively unmasking tokens. The key challenge is determining which tokens can be unmasked in parallel without degrading quality. Prior work, Fast-dLLM, addressed this with KV caching and confidence-guided parallel decoding, but its decoding theory assumed a homogeneous high-confidence threshold. This effectively reduced each candidate set to its weakest selected token, limiting parallelism.

Fréchet Profile Decoding: The Innovation

The authors propose Fréchet profile decoding, which selects parallel commit sets from the full sorted confidence profile rather than a single worst-case confidence. This is a heterogeneous-confidence generalization of Fast-dLLM's factor selector — it recovers the previous rule exactly when confidences are equal, and adds a provable heterogeneity bonus when selected tokens have uneven confidences. Importantly, Fast-dLLM++ leaves the model, diffusion process, and cache implementation unchanged, making it a drop-in replacement for existing Fast-dLLM decoding.

Empirical Results

Experiments were conducted using the LLaDA-8B model on four benchmarks:

Benchmark Throughput Improvement Accuracy vs Fast-dLLM
GSM8K Up to 37% Comparable
MATH Up to 37% Comparable
HumanEval Up to 37% Comparable
MBPP Up to 37% Comparable

As the paper states, > "profile-aware selection improves the accuracy–throughput frontier by exploiting safe parallelism that weakest-token rules miss." The theoretical improvement translates directly into empirical gains, with up to 37% higher throughput at comparable accuracy.

Implications for Enterprise AI

For enterprise technology leaders evaluating LLM deployment, inference speed translates directly into lower costs and faster response times. Fast-dLLM++ requires no additional training or hardware changes — it is a drop-in upgrade for systems already using Fast-dLLM. The code is released publicly (see paper for repository link). While the research focuses on language tasks, the underlying principle of heterogeneous-confidence decoding could apply to any diffusion-based generative model used in data synthesis or document processing within supply chain and logistics applications.

The method's ability to improve throughput without degrading accuracy makes it attractive for real-time AI systems where every millisecond counts. As organizations scale AI across customer service, contract analysis, and operational planning, tools like Fast-dLLM++ can help achieve higher efficiency without compromising on quality.


Sources:

Keep Reading

Recommended Stories

New Research Reveals Truthfulness Preserved Across LLM Lineages, Enabling Better Hallucination Control Technology

New Research Reveals Truthfulness Preserved Across LLM Lineages, Enabling Better Hallucination Control

A new paper from researchers shows that truthfulness-related attention heads are preserved across generations of large language models, even after instruction tuning or multimodal adaptation. The authors propose TruthProbe, a soft-gating strategy that amplifies these heads to reduce hallucinations, with improvements on HaluEval, POPE, and CHAIR benchmarks.

June 16, 2026
MA-SBI: Misspecification-Aware Simulation-Based Inference via Side-Channel Guidance Technology

MA-SBI: Misspecification-Aware Simulation-Based Inference via Side-Channel Guidance

Researchers propose MA-SBI, a misspecification-aware simulation-based inference framework that leverages unstructured side-channel information—such as regime labels or policy bulletins—to correct posterior estimates without requiring ground-truth parameter pairs. The method matches oracle performance on hide-the-calibration benchmarks and improves log-likelihood on real COVID epidemiological data.

June 16, 2026
Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling Technology

Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling

Researchers propose KMAS, an adaptive negative sampling approach that enhances knowledge graph foundation models (KGFMs) by generating hard negative triples from relation embeddings. The method dynamically adjusts the ratio of hard negatives during training, improving performance across 44 datasets without significant extra time or memory.

June 16, 2026
When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control Technology

When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

A research paper introduces RLScale-Bench, a reproducible benchmark for deep reinforcement learning on adaptive resource control. Testing six DRL algorithms and a calibrated rule-based baseline on Kubernetes autoscaling across six workload patterns, the study finds that the calibrated controller achieves the lowest cost on all workloads, though DRL agents perform better on bursty and flash traffic. Discrete-action DRL algorithms also significantly outperform continuous-action ones in constraint violations.

June 16, 2026