Topic
machine learning
LM-SPT Uses Semantic Distillation to Improve Speech Tokenization for Language Models
A new speech tokenization method called LM-SPT uses semantic speech-resynthesis distillation to better align discrete speech tokens with language models. The approach outperforms previous semantic-enhanced tokenizers on automatic speech recognition and text-to-speech tasks without sacrificing reconstruction fidelity.
New Research Reveals Distinct Training Dynamics of On-Policy Distillation for Large Language Models
A research paper on arXiv characterizes the training dynamics of on-policy distillation (OPD) for large language models, finding that OPD occupies a distinct update geometry compared to supervised fine-tuning and reinforcement learning with verifiable rewards. The study shows OPD updates affect fewer weights, avoid principal directions, and exhibit subspace locking.
UniSinger: First End-to-End Framework Unifies Song Generation and Singing Voice Conversion
Researchers have introduced UniSinger, the first end-to-end framework that unifies song generation and singing voice conversion with accompaniment co-generation. Built on a multimodal diffusion transformer, it enables zero-shot speaker cloning and fine-grained timbre control across tasks. Experiments demonstrate state-of-the-art performance on both tasks, offering new possibilities for intelligent music production.
Epileptic Seizure Detection via Frequency-Aware Graph Convolutional Networks Achieves 99% Accuracy
A research team has developed a frequency-aware framework for epileptic seizure detection using EEG signals. By decomposing signals into five frequency bands and applying a graph convolutional neural network (GCN), the method achieves up to 99.7% accuracy on specific bands and an overall broadband accuracy of 99.01% on the CHB-MIT dataset, while enhancing neurophysiological interpretability.
Smooth-Basis Models Challenge Tree Ensembles in Tabular Regression Benchmark
A new study from Gerber, Luciano, Lloyd, and Huw benchmarks smooth-basis models (Chebyshev polynomial regressor, anisotropic RBF network, and a hybrid) against tree ensembles and a transformer on 55 tabular regression datasets. The transformer ranks first in accuracy but requires GPUs, while among CPU-viable models, smooth models and tree ensembles are statistically tied, with smooth models showing tighter generalization gaps.
Study Reveals Binary Classifiers That Excel Under Extreme Imbalance Without Rebalancing
A new study from arXiv systematically evaluates binary classifiers under class imbalance without rebalancing techniques. Results show that advanced models such as TabPFN and boosting-based ensembles maintain high performance even as minority class size shrinks, while traditional classifiers deteriorate. The research offers guidance for model selection in imbalanced learning tasks.
Region-Adaptive Sampling Cuts Diffusion Transformer Inference Time by Up to 2.5x With Negligible Quality Loss
Researchers introduce RAS, a training-free sampling method for Diffusion Transformers that selectively updates only the regions of focus at each step, caching others. Achieves up to 2.51x speedup on Lumina-Next-T2I and 2.36x on Stable Diffusion 3 with minimal quality drop, as reported in a new arxiv paper. A user study found comparable quality at 1.6x speedup.
Input-Dependent Fisher Information Enables Local Sensitivity Analysis of Medical Image Classifiers
A research paper introduces a local sensitivity analysis framework based on the input-dependent Fisher Information Matrix (iFIM) for medical image classifiers. The method projects input images into high- and low-sensitivity components, showing that high-sensitivity components are more strongly tied to predictive confidence and classification performance. This provides a principled tool for interpreting black-box deep neural networks in medical imaging.
Fast LLM-Based Semantic Filtering: Unified Framework and Adaptive Two-Phase Method Deliver 1.6–2.0x Speed Gains
A new research paper from Kim, Catheland, and Ailamaki introduces a unified framework and adaptive two-phase method for LLM-based semantic filtering. By composing model-free clustering and online-trained proxies adaptively, and using oracle confidence for multiple purposes, the method achieves 1.6–2.0x faster performance than prior cascades while meeting a 90% accuracy target on 95% of queries across three 10K-document corpora.
Wasserstein Equilibrium Decoding Boosts Reliability in Medical Visual Question Answering
Researchers have extended game-theoretic decoding to vision-language models for medical visual question answering, introducing a Wasserstein stopping criterion that improves accuracy by up to 3.5 percentage points and reduces inference iterations by 20% while maintaining reliability.
Vocabulary Dropout Technique Prevents Diversity Collapse in LLM Co-Evolution Training
A new method called vocabulary dropout prevents diversity collapse in co-evolutionary LLM training. Applied to Qwen3 models on mathematical reasoning, it improved solver performance by an average of 4.4 points, with largest gains on competition-level benchmarks.
Bayesian Visualization Helps Humans Negotiate with AI Across Multiple Issues, Study Shows
A new study from researchers Parmar and Silpasuwanchai reveals that human performance in AI-assisted negotiations drops when more than three issues are involved. They developed a Bayesian uncertainty visualization that helps users identify promising agreement zones, improving outcomes and efficiency in a property rental scenario.
Multi-Sequence Verifiers Cut Inference Latency in Half for LLM Reasoning
A new paper by Kim et al. introduces the Multi-Sequence Verifier (MSV), a lightweight verifier that improves calibration for parallel test-time scaling in large language models. MSV enhances best-of-N selection accuracy by up to 6% and enables early-stopping strategies that achieve the same accuracy with less than half the inference latency.
When RAG Hurts: Research Identifies Attention Distraction in Vision-Language AI Models and Proposes Mitigation
A new study from arXiv identifies a previously overlooked failure mode in Retrieval-Augmented Generation (RAG) for Large Vision-Language Models (LVLMs): Attention Distraction (AD). The researchers propose MAD-RAG, a training-free intervention that decouples visual grounding from context integration, achieving absolute accuracy gains of up to 9.20% on standard benchmarks and rectifying up to 74.68% of failures with negligible computational overhead.
OBCache Prunes KV Cache for Efficient Long-Context LLM Inference with Output-Aware Scoring
A new method called Optimal Brain Cache (OBCache) treats key-value cache eviction as a layer-wise structured pruning problem. By measuring token saliency through perturbation in attention outputs, OBCache outperforms heuristic-based approaches on LLaMA and Qwen models, consistently improving long-context accuracy according to the paper.
New Survey Maps How Evidence Tracing and Execution Provenance Can Make LLM Agents Trustworthy
A new survey from arXiv explores evidence tracing and execution provenance as key mechanisms for ensuring trustworthiness in LLM-based agents. The paper defines a unified framework connecting retrieval grounding, tool-use safety, memory lineage, and failure diagnosis, and reviews benchmarks and open challenges.
New Unifying Lens for Learning to Hash Could Cut Memory Costs in Large-Scale Retrieval
A new arXiv paper from researcher Sean Moran proposes a unifying lens for approximate nearest-neighbour search, framing all methods as variations of projection, quantisation, and organisation. The work introduces the open BitBudget benchmark and finds that quantisation delivers the largest memory savings, with one-bit codes matching uncompressed quality for most embedders at 1/32 the size. The study also shows supervised eight-byte codes can more than double retrieval quality over two-kilobyte floats.
Mosaic: Data-Free Knowledge Distillation Framework Uses Mixture-of-Experts to Tackle Heterogeneous Federated Learning
Researchers propose Mosaic, a novel data-free knowledge distillation framework that leverages Mixture-of-Experts (MoE) to overcome model and data heterogeneity in federated learning. Mosaic trains local generative models to synthesize data, forms an MoE from client models, and distills it into a global model. Experiments show consistent outperformance over state-of-the-art approaches on image and multimodal benchmarks.
Study Reveals 27 Error Types in LLM Text-to-SQL, Introduces MapleDoctor Repair Framework
Researchers conducted the first comprehensive study of errors in LLM-based text-to-SQL systems using in-context learning. They identified 27 error types across 7 categories and proposed MapleDoctor, a detection and repair framework that outperforms existing solutions by repairing 13.8% more queries with negligible mis-repairs and reducing repair latency by 67.4%.
Pruning Optimisations Boost LUT-Based Neural Network Scalability and Efficiency
Researchers propose a pruning-optimised Look-Up Table (LUT) matrix multiplication unit (LUT-MU) to address scalability limits in LUT-based neural networks. Deployed on FPGAs, it delivers up to 1.6x throughput improvement and 4.2x energy efficiency gains over CUDA-based implementations, with 1.3 to 2.6x resource savings versus original MADDNESS-based networks.
Tree-like Self-Play Framework Teaches LLMs to Fix Security Flaws in Code Generation
Researchers introduce Tree-like Self-Play (TSP), a framework that treats secure code generation as a fine-grained sequential decision process. TSP significantly outperforms standard supervised fine-tuning (SFT) and reinforcement learning (RL) on Python security benchmarks, achieving a 75.8% pass rate and reducing unseen vulnerabilities by 24.5% while generalising across programming languages.
Research Proposes Task-Based Neurons to Enhance Neural Network Feature Representation
A study published on arXiv introduces a framework for designing task-based neurons inspired by the human brain's neuronal diversity. Using polynomials as base functions, experiments on synthetic data, classic benchmarks, and real-world applications demonstrate competitive performance against state-of-the-art models.
Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains
A new arXiv paper presents methods for compressing LLM-generated text, achieving over 100x reduction in data transfer compared to prior techniques. Lossless compression via domain-adapted LoRA adapters doubles efficiency, while an interactive Question-Asking protocol recovers up to 72% of the capability gap between small and large models using only 10 binary questions.
Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling
Researchers propose KMAS, an adaptive negative sampling approach that enhances knowledge graph foundation models (KGFMs) by generating hard negative triples from relation embeddings. The method dynamically adjusts the ratio of hard negatives during training, improving performance across 44 datasets without significant extra time or memory.
Gated QKAN-FWP: Quantum-Inspired Sequence Learning Achieves Parameter Efficiency on NISQ Devices
A new quantum-inspired sequence learning model, Gated QKAN-FWP, uses single-qubit data re-uploading circuits to achieve high accuracy with only 12,500 parameters on long-horizon forecasting tasks. The model outperforms classical recurrent networks such as LSTM and WaveNet-LSTM while being deployable on current NISQ quantum hardware from IonQ and IBM.
New Framework TRACED Evaluates LLM Reasoning Using Geometric Stability and Progress
A new research framework called TRACED evaluates LLM reasoning quality by analyzing geometric progress and stability of reasoning traces. It distinguishes correct reasoning from hallucinations based on trajectory patterns, offering a more robust evaluation method than scalar probabilities.
How Scale Design Impacts LLM Metacognition and Enterprise AI Reliability
A study on arXiv reveals that the confidence scale used in LLMs (typically 0-100) leads to heavy discretization, with over 78% of responses on three round numbers. Changing the scale to 0-20 improves metacognitive efficiency. The findings have implications for enterprise use of LLMs in supply chain decision-making where confidence calibration is critical.
Adaptive Memory Crystallization: New AI Architecture Slashes Forgetting by 80% While Boosting Knowledge Transfer by 43%
Researchers have developed Adaptive Memory Crystallization (AMC), a memory architecture for autonomous AI agents that solves the catastrophic forgetting problem in dynamic environments. In tests on Meta-World MT50, Atari, and MuJoCo, AMC improved forward transfer by 34-43% over the strongest baseline, reduced forgetting by 67-80%, and cut memory footprint by 62%.
RaBiT: Residual-Aware Binarization Training for Accurate and Efficient Large Language Models
Researchers propose RaBiT, a quantization framework that resolves pathological feature co-adaptation in residual binarized LLMs. RaBiT delivers state-of-the-art 2-bit accuracy and 4.49x inference speed-up on an RTX 4090, rivaling hardware-intensive Vector Quantization methods.
Fine-Tuning a 7B Advisor on Free-Tier GPUs: Adapter-Handoff Recipe Published with Synthetic Data Reliability Warning
A new paper from Md Millat Hosen presents a method to fine-tune Mistral-7B-Instruct on free Kaggle/Colab GPUs using QLoRA adapter handoff. The evaluation reveals that while the fine-tuned model better matched synthetic training data, it performed worse on advising quality and factuality compared to the base model, with errors traced to the synthetic data pipeline.
SDFLoRA: Selective Decoupled Federated LoRA for Privacy-Preserving Fine-Tuning with Heterogeneous Clients
Federated learning for LLMs faces challenges from heterogeneous client ranks and data distributions. SDFLoRA proposes a structure-aware LoRA framework that decouples updates into shared and private components, enabling stable aggregation, personalization, and improved differential privacy. Experiments show it outperforms existing federated LoRA baselines.
When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control
A research paper introduces RLScale-Bench, a reproducible benchmark for deep reinforcement learning on adaptive resource control. Testing six DRL algorithms and a calibrated rule-based baseline on Kubernetes autoscaling across six workload patterns, the study finds that the calibrated controller achieves the lowest cost on all workloads, though DRL agents perform better on bursty and flash traffic. Discrete-action DRL algorithms also significantly outperform continuous-action ones in constraint violations.
Fast-dLLM++ Boosts Diffusion LLM Inference Up to 37% With Fréchet Profile Decoding
Researchers propose Fast-dLLM++, a training-free extension to Fast-dLLM that uses Fréchet profile decoding to select parallel token commit sets from the full confidence profile. Experiments on LLaDA-8B show up to 37% higher throughput at comparable accuracy on benchmarks including GSM8K, MATH, HumanEval, and MBPP.
MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks
MuZZLE is an automated agentic framework that evaluates the security of LLM-based web agents against indirect prompt injection attacks. It discovered 44 new attacks across 4 web applications, including cross-application injection and agent-tailored phishing, by adaptively generating context-aware malicious instructions based on agent execution trajectories.
CircuitLasso Enables Scalable Interpretability for Large Language Models at Lower Cost
A new approach called CircuitLasso uses sparse linear regression to learn interpretable circuits in large language models. It achieves structural accuracy comparable to intervention-based methods on benchmark data while dramatically reducing computational cost. The method also reveals relationships among sparse autoencoder features, aiding understanding of how semantic features propagate through models.
MapDream: Task-Driven Map Learning Achieves State-of-the-Art Vision-Language Navigation
Researchers propose MapDream, a framework that learns bird's-eye-view maps directly from navigation objectives rather than hand-crafted reconstruction. The approach achieves state-of-the-art monocular performance on the R2R-CE and RxR-CE benchmarks.
AL-GNN: New Privacy-Preserving Continual Graph Learning Eliminates Replay Buffers and Backpropagation
Researchers propose AL-GNN, a continual graph learning framework that uses analytic learning to avoid replay buffers and backpropagation. It achieves 10% higher average performance on CoraFull, reduces forgetting by over 30% on Reddit, and cuts training time by nearly 50% while preserving data privacy.
CLoVE: New Federated Learning Algorithm Clusters Loss Vectors for Personalization
Researchers propose CLoVE (Clustering of Loss Vector Embeddings), a novel clustered federated learning algorithm that groups clients based on loss patterns. It achieves high cluster recovery in few rounds and state-of-the-art accuracy across supervised and unsupervised tasks.
New EEG Benchmark Promises Standardized Evaluation of Foundation Models
A new benchmark called EEG-FM-Bench aims to standardize evaluation of electroencephalography foundation models (EEG-FMs). It integrates 14 datasets across 10 paradigms and provides tools for gradient and representation analysis. Early experiments reveal critical insights about multi-task learning, pre-training efficiency, and model scaling.
Robot Learning Reveals Emergent 'Self' Subnetwork in Continual Learning Studies
A new arXiv paper proposes a method to quantify an emergent 'self' in robots by identifying invariant subnetworks that persist during continual learning. The study finds that robots learning variable tasks develop a stable subnetwork that, when preserved, aids adaptation, and when damaged, impairs performance—validated across three robot platforms.
New Book on Optimal Transport Offers Machine Learning Practitioners a Unified Framework
A new book titled 'Optimal Transport for Machine Learners' presents a comprehensive overview of optimal transport techniques tailored for machine learning. It covers key concepts such as Kantorovich couplings, Wasserstein distances, Sinkhorn scaling, and gradient flows, providing a mathematical framework for comparing probability measures in ML applications.
Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Livestock Monitoring
Researchers distilled SAM 3's 446M-parameter backbone into a 40.66M-parameter student, achieving 92.29% MOTA and 96.15% IDF1 on the Edinburgh Pig dataset. The pipeline runs on an NVIDIA Jetson Orin NX 16GB with 4.9GB headroom, enabling on-device individual-level livestock monitoring and longitudinal visual analytics.
New Diffusion Model Learns Permutation Distributions with Softer, More Tractable Trajectories
Researchers propose Soft-Rank Diffusion, a discrete diffusion framework that learns probability distributions over permutations more effectively than prior shuffle-based methods. By replacing abrupt shuffle corruption with a structured soft-rank forward process and introducing contextualized generalized Plackett-Luce denoisers, the method achieves consistent gains on sorting and combinatorial optimization tasks, especially for long sequences.
RidgeCut: Reinforcement Learning Framework Optimizes Logistics Network Partitioning with Rings and Wedges
Researchers have developed RidgeCut, a reinforcement learning framework that leverages ring-and-wedge topology to improve graph partitioning for transportation networks. The method consistently outperforms existing approaches in normalized cut metrics and generalizes across graph sizes, offering potential applications in logistics and supply chain network design.
SDS-LoRA: New Low-Rank Adaptation Method Fixes Gradient Distortion in Large Model Fine-Tuning
A new paper on arXiv introduces SDS-LoRA, a low-rank parameterization that overcomes anisotropic gradient scaling in LoRA. By structurally decoupling singular values from the backward pass, SDS-LoRA ensures gradients are only applied through orthonormal bases, improving convergence and reducing the performance gap to full fine-tuning. Experimental results across natural language and vision benchmarks show enhanced adaptation performance.
Self-Gated Clarification Method Boosts AI Accuracy in Complex Tariff Classification
Researchers propose ACTION-RATING, a self-gated clarification formulation that enables hierarchical language agents to decide when to ask for help during decision-making. Tested on Harmonized Tariff Schedule classification across nine LLMs, the method improved Information-Seeking Effectiveness from 50% to 74% and achieved up to +16.2% accuracy gains at the 10-digit level.
Tyler Framework Boosts LLM Reasoning by Up to 14 Points with Smarter Compute Allocation
A new framework called Tyler introduces typed latent reasoning for large language models, learning when to invoke latent computation and how much to allocate. On three backbone LLMs, Tyler improved accuracy by up to 14.49 points over chain-of-thought prompting and up to 4.30 points over competing baselines, while reducing forgetting.
MA-ProofBench: New Benchmark Tests LLMs on Formal Theorem Proving in Mathematical Analysis
Researchers introduce MA-ProofBench, the first formal theorem-proving benchmark dedicated to mathematical analysis. It contains 200 theorems across six topics at two difficulty levels. Evaluations show that even the best model, GPT-5.5, achieves only 16% Pass@8 on undergraduate-level problems and 5% on Ph.D.-level problems, highlighting significant limitations of current LLMs in formal mathematical reasoning.
G-Loss: New Graph-Guided Loss Function Boosts Language Model Fine-Tuning Accuracy
Researchers introduce G-Loss, a graph-guided loss function that leverages global semantic relationships to fine-tune language models more effectively than traditional loss functions, showing improved accuracy and faster convergence on five benchmark datasets.
Learned Image Compression Framework SPARC Boosts VLA Robot Control Performance in Bandwidth-Limited Settings
Researchers introduce SPARC (SPatially Adaptive Rate Control), a learned image compression framework tailored for vision-language-action (VLA) models. SPARC adaptively allocates bitrate based on task relevance and uses a tilted rate loss to preserve critical visual patterns. Experiments on robotic benchmarks RoboCasa365, VLABench, and LIBERO show SPARC achieves stronger control performance than conventional codecs at the same bitrate, with real-world benefits for remote robot control.
Study Reveals Patterns of Pre-Trained Deep Learning Model Reuse in Scientific Research
A new empirical study of 17,718 open-access papers reveals how natural scientists reuse pre-trained deep learning models (PTMs). The study finds that 'Biochemistry, Genetics and Molecular Biology' leads in PTM reuse, 'adaptation' is the most common reuse pattern, and the 'testing' stage of the scientific process benefits most from PTM integration.
LLM Jaggedness Unlocks Scientific Creativity: New Benchmark Reveals Uneven AI Capabilities Can Be Harnessed for Innovation
A new arXiv paper introduces SciAidanBench, a benchmark for measuring the scientific creativity of large language models. The research finds that LLM capabilities are jagged—uneven across tasks and domains—but that this jaggedness can be harnessed through ensemble methods to produce superior scientific ideas.
Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs
Researchers introduce cascaded sparse autoencoders (CSAEs) that learn hierarchical visual concepts in multimodal large language models. By training a second-level SAE on the decoder weights of the first, CSAEs achieve 'concepts of concepts' without nesting or stacking bottlenecks. Experiments on Qwen3-VL, Gemma-3, and LLaVA show improved interpretability and effective group-level steering.
First Model-Free Universal AI Agent Proved Asymptotically Optimal in General Reinforcement Learning
Researchers introduced Universal AI with Q-Induction (AIQI), the first model-free agent proven asymptotically ε-optimal in general reinforcement learning. Unlike previous model-based optimal agents like AIXI, AIQI performs induction over action-value functions. The proof also establishes optimality for Self-AIXI without ad-hoc assumptions.
AlignCoder Uses Reinforcement Learning to Improve Repository-Level Code Completion by 18%
AlignCoder is a novel framework for repository-level code completion that combines query enhancement with reinforcement learning to train a retriever (AlignRetriever). It addresses misalignment issues in retrieval-augmented generation (RAG) approaches, achieving an 18.1% improvement in Exact Match score on the CrossCodeEval benchmark across multiple code LLMs.
FlowState: New Time-Series Model Handles Any Sampling Rate Without Retraining
IBM Research has developed FlowState, a novel time-series foundation model (TSFM) that is sampling-rate-equivariant, meaning it can handle data sampled at different rates without retraining. The model uses a state space encoder and a functional basis decoder to achieve continuous-time modeling, and it outperforms larger models on the GIFT-Eval benchmark while being one of the smallest TSFMs.
Graphical-Probabilistic Modeling Brings Rigor to LLM-Native Software Engineering
Current LLM-native software development relies on experimentation and heuristics. A proposed framework called Generation Networks uses graphical probabilistic models to document generative flows and enable design-level reasoning, bringing the rigor of traditional software engineering to LLM systems.
ControlMap: Controllable HD Map Generation Using Latent Diffusion for Traffic Simulation
Current autonomous driving simulation is limited by costly HD map creation. ControlMap presents a pipeline using latent diffusion and ControlNet to generate HD maps that follow specific road topologies and city styles. The model introduces novel metrics for adherence and similarity.
Akasha 2 Achieves 4x Faster Visual Synthesis with Hamiltonian-Inspired AI Architecture
Akasha 2 introduces Hamiltonian State Space Duality and Visual-Language Joint Embedding Predictive Architecture, achieving state-of-the-art video prediction with 4x faster synthesis than diffusion models and 3-18x speedup over transformers. The system enforces physical conservation laws for spatiotemporal coherence.
PURe Module Enhances Vision Networks by Adding Multiplicative Local Interactions
Researchers propose PURe, a Product-Unit Residual Module that introduces explicit multiplicative local interactions into deep vision networks. The module serves as a drop-in replacement for native residual units, consistently improving performance on benchmarks like ImageNet and CIFAR-10 while using smaller parameter budgets.