Home ›› Topics ›› neural networks

Topic

neural networks

28 stories

Artificial Intelligence #fastmix#data mixture

FastMix: Gradient-Based Data Mixture Optimization Reduces Search Cost in AI Training

FastMix is a novel framework that automates data mixture discovery by training only a single proxy model and jointly optimizing mixture coefficients and model parameters via gradient descent. It reformulates mixture selection as a bilevel optimization problem, enabling efficient, scalable optimization that outperforms baselines.

Jun 17, 2026 1 source

Norm-Agnostic Residual Networks Offer Path to Scaling Adaptive Depth in Deep Learning

Technology

Artificial Intelligence #artificial intelligence#residual networks

Norm-Agnostic Residual Networks Offer Path to Scaling Adaptive Depth in Deep Learning

Researchers introduce NAG, a norm-agnostic residual architecture that prevents later layers from being suppressed by norm growth. This enables training of much deeper models and introduces an interpretable Mixture-of-Depths mechanism that can serve as a pretraining scaling strategy, with 20-25% sparsity matching full-depth baseline under equal compute.

Jun 17, 2026 1 source

Lightweight Attention Mechanism Boosts Robust Multimodal Integration in Global Workspace Architecture

Technology

Artificial Intelligence #attention mechanism#multimodal integration

Lightweight Attention Mechanism Boosts Robust Multimodal Integration in Global Workspace Architecture

A new arXiv paper introduces a lightweight attention mechanism for multimodal integration in a global workspace architecture. The method improves robustness against corrupted modalities while using far fewer trainable parameters than end-to-end attention baselines. Tests on Simple Shapes and MM-IMDb 1.0 show transferable selection strategies across tasks and unseen modalities.

Jun 17, 2026 1 source

Neural Audio Codecs' Low Frame Rate Degradation Linked to Training Configuration

Technology

Artificial Intelligence #neural audio codecs#low frame rate

Neural Audio Codecs' Low Frame Rate Degradation Linked to Training Configuration

A new study by Gichamba and Busogi investigates the mechanisms behind low frame rate degradation in neural audio codecs. The researchers found that a quality cliff at 6.25 Hz is caused by suboptimal training configuration, not by phonemic collisions or codebook saturation. After correcting the training setup, the codecs perform smoothly down to 3.1 Hz and 1.6 Hz, suggesting that low frame rate efficiency gains are more accessible than previously assumed.

Jun 17, 2026 1 source

Epileptic Seizure Detection via Frequency-Aware Graph Convolutional Networks Achieves 99% Accuracy

Technology

Artificial Intelligence #epileptic seizure detection#eeg signals

Epileptic Seizure Detection via Frequency-Aware Graph Convolutional Networks Achieves 99% Accuracy

A research team has developed a frequency-aware framework for epileptic seizure detection using EEG signals. By decomposing signals into five frequency bands and applying a graph convolutional neural network (GCN), the method achieves up to 99.7% accuracy on specific bands and an overall broadband accuracy of 99.01% on the CHB-MIT dataset, while enhancing neurophysiological interpretability.

Jun 17, 2026 1 source

Pruning Optimisations Boost LUT-Based Neural Network Scalability and Efficiency

Technology

Artificial Intelligence #neural networks#pruning

Pruning Optimisations Boost LUT-Based Neural Network Scalability and Efficiency

Researchers propose a pruning-optimised Look-Up Table (LUT) matrix multiplication unit (LUT-MU) to address scalability limits in LUT-based neural networks. Deployed on FPGAs, it delivers up to 1.6x throughput improvement and 4.2x energy efficiency gains over CUDA-based implementations, with 1.3 to 2.6x resource savings versus original MADDNESS-based networks.

Jun 16, 2026 1 source

Research Proposes Task-Based Neurons to Enhance Neural Network Feature Representation

Technology

Artificial Intelligence #artificial intelligence#neural networks

Research Proposes Task-Based Neurons to Enhance Neural Network Feature Representation

A study published on arXiv introduces a framework for designing task-based neurons inspired by the human brain's neuronal diversity. Using polynomials as base functions, experiments on synthetic data, classic benchmarks, and real-world applications demonstrate competitive performance against state-of-the-art models.

Jun 16, 2026 1 source

Gated QKAN-FWP: Quantum-Inspired Sequence Learning Achieves Parameter Efficiency on NISQ Devices

Technology

Artificial Intelligence #quantum-inspired#sequence learning

Gated QKAN-FWP: Quantum-Inspired Sequence Learning Achieves Parameter Efficiency on NISQ Devices

A new quantum-inspired sequence learning model, Gated QKAN-FWP, uses single-qubit data re-uploading circuits to achieve high accuracy with only 12,500 parameters on long-horizon forecasting tasks. The model outperforms classical recurrent networks such as LSTM and WaveNet-LSTM while being deployable on current NISQ quantum hardware from IonQ and IBM.

Jun 16, 2026 1 source

Learned Image Compression Framework SPARC Boosts VLA Robot Control Performance in Bandwidth-Limited Settings

Technology

Artificial Intelligence #learned image compression#vision-language-action models

Learned Image Compression Framework SPARC Boosts VLA Robot Control Performance in Bandwidth-Limited Settings

Researchers introduce SPARC (SPatially Adaptive Rate Control), a learned image compression framework tailored for vision-language-action (VLA) models. SPARC adaptively allocates bitrate based on task relevance and uses a tilted rate loss to preserve critical visual patterns. Experiments on robotic benchmarks RoboCasa365, VLABench, and LIBERO show SPARC achieves stronger control performance than conventional codecs at the same bitrate, with real-world benefits for remote robot control.

Jun 16, 2026 1 source

Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs

Technology

Artificial Intelligence #cascaded sparse autoencoders#multimodal llms

Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs

Researchers introduce cascaded sparse autoencoders (CSAEs) that learn hierarchical visual concepts in multimodal large language models. By training a second-level SAE on the decoder weights of the first, CSAEs achieve 'concepts of concepts' without nesting or stacking bottlenecks. Experiments on Qwen3-VL, Gemma-3, and LLaVA show improved interpretability and effective group-level steering.

Jun 16, 2026 1 source

PURe Module Enhances Vision Networks by Adding Multiplicative Local Interactions

Technology

Artificial Intelligence #plug-and-play#product-unit

PURe Module Enhances Vision Networks by Adding Multiplicative Local Interactions

Researchers propose PURe, a Product-Unit Residual Module that introduces explicit multiplicative local interactions into deep vision networks. The module serves as a drop-in replacement for native residual units, consistently improving performance on benchmarks like ImageNet and CIFAR-10 while using smaller parameter budgets.

Jun 16, 2026 1 source

New Unified Definition of AI Hallucination Pins It on Inaccurate World Modeling

Technology

Artificial Intelligence #hallucination#artificial intelligence

New Unified Definition of AI Hallucination Pins It on Inaccurate World Modeling

A new arXiv paper by Liu et al. proposes a unified definition of hallucination in large language models, defining it as inaccurate internal world modeling observable to the user. The framework subsumes prior definitions and distinguishes true hallucinations from planning or reward errors, and introduces the HalluWorld benchmark for stress-testing models.

Jun 16, 2026 1 source

Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation

Technology

Artificial Intelligence #z-plane neural networks#bounded geometric activation

Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation

Researchers propose Z-Plane Neural Networks, which replace traditional ReLU activations and LayerNorm with a bounded geometric activation called Radial Bounding. This new approach maintains 1-Lipschitz continuity, prevents gradient vanishing, and preserves directional information. A 100-layer Z-Plane MLP achieved 98.34% accuracy on MNIST without any ReLU or LayerNorm, demonstrating numerical stability.

Jun 16, 2026 1 source

New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks

Technology

Artificial Intelligence #gradient descent#recurrent networks

New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks

Researchers introduce the Gradient-based Recurrent In-context Learner (GRIL), a linear recurrent network architecture with windowed cross-product self-attention that can implement minibatch gradient descent on a task-specific predictor in a single forward pass. The design achieves strong performance on synthetic in-context learning tasks, Long Range Arena, and language modeling.

Jun 16, 2026 1 source

New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders

Technology

Artificial Intelligence #transformers#representation autoencoders

New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders

A new research paper proposes Drift-RAE, a method for distilling pretrained flow models in representation autoencoder latent spaces. It overcomes anisotropy and large curvature challenges, achieving 1.77 FID on ImageNet 256 with only 10,000 distillation steps, outperforming existing RAE distillation methods.

Jun 16, 2026 1 source

New Research Demystifies Variance in Circuit Discovery of Large Language Models

Technology

Artificial Intelligence #llms#circuit discovery

New Research Demystifies Variance in Circuit Discovery of Large Language Models

A new research paper explores variance in circuit discovery of large language models, identifying resampling, rephrasing, and sample-wise variance. The authors propose CEAP, an improved method over EAP-IG with theoretical guarantees, and argue that rephrasing variance makes it hard to find comprehensive circuits, suggesting LLMs may be inherently difficult to steer.

Jun 16, 2026 1 source

New Theory Explains How Deep Transformers Achieve Adaptive Inference Using Function Vectors

Technology

Artificial Intelligence #artificial intelligence#deep learning

New Theory Explains How Deep Transformers Achieve Adaptive Inference Using Function Vectors

A new research paper introduces a theory of deep transformers as mean-field interacting systems that implement distributed inference using 'function vectors' to adaptively infer latent context variables at finer scales over layers. The theory predicts a relationship between non-Gaussian hierarchical structure and transformer depth, tested with constrained linear attention models.

Jun 16, 2026 1 source

Lossy Compression Slashes Storage 39x for Neural Surrogate Models, Study Finds

Technology

Artificial Intelligence #lossy compression#neural networks

Lossy Compression Slashes Storage 39x for Neural Surrogate Models, Study Finds

A new study quantifies the impact of lossy compression on neural generative surrogate models, finding that storage can be reduced by up to 39x and training time by up to 3x with negligible effect on model quality, offering a path to more efficient AI training in data-intensive domains.

Jun 16, 2026 1 source

Pixel-TTS: Image-Based Text Rendering Improves Robustness in Speech Synthesis

Technology

Artificial Intelligence #text-to-speech#artificial intelligence

Pixel-TTS: Image-Based Text Rendering Improves Robustness in Speech Synthesis

Researchers propose Pixel-TTS, the first visually grounded text-to-speech framework that renders text as images and processes them with 2D convolutions. This eliminates embedding matrix expansion during fine-tuning and improves robustness to unseen characters and orthographic variations. Experiments show competitive performance with faster convergence and zero-shot generalization.

Jun 16, 2026 1 source

EEGNet Study Reveals Key Limitations in fNIRS Cognitive Load Classification

Technology

Artificial Intelligence #eegnet#fnirs

EEGNet Study Reveals Key Limitations in fNIRS Cognitive Load Classification

A comprehensive study published on arXiv systematically evaluates EEGNet for classifying cognitive load from fNIRS signals. The research highlights critical challenges in generalization, achieving only 56.11% accuracy under subject-independent evaluation, and underscores the importance of segmentation strategy and learning rate selection.

Jun 16, 2026 1 source

Sub-Quadratic Vision Transformers Cut Self-Attention Cost for Faster Image Captioning

Technology

Artificial Intelligence #vision transformers#image captioning

Sub-Quadratic Vision Transformers Cut Self-Attention Cost for Faster Image Captioning

A new arXiv preprint from Ghosh et al. proposes a sub-quadratic vision transformer architecture for image captioning. By replacing standard self-attention with a Gaussian Mixture Model (GMM) clustering mechanism, the model reduces computational complexity from quadratic O(n²) to linear O(nK). The approach uses an autoregressive GPT-based decoder and achieves competitive results on the Flickr30K dataset.

Jun 16, 2026 1 source

AI Safety Monitors May Fail After Model Updates, New Benchmarking Study Finds

Technology

Artificial Intelligence #ai safety#model monitoring

AI Safety Monitors May Fail After Model Updates, New Benchmarking Study Finds

A new research paper presents the first systematic test of whether activation monitors remain reliable after common model updates such as quantization and fine-tuning. The study finds that while quantization largely preserves performance, fine-tuning frequently makes monitors stale, with privacy monitors most affected. Degradation is predictable, enabling triaged revalidation.

Jun 16, 2026 1 source

Multi-Encoder-Decoder VAE Enables Cross-Subject Neural Alignment Without Shared Stimuli

Technology

Artificial Intelligence #artificial intelligence#machine learning

Multi-Encoder-Decoder VAE Enables Cross-Subject Neural Alignment Without Shared Stimuli

A new Multi-Encoder-Decoder Variational Autoencoder (MED-VAE) achieves cross-subject alignment of neural activity without shared stimuli by using a pretrained artificial neural network as a scaffold. Tested on the Natural Scenes Dataset, MED-VAE creates semantically organized common latent spaces and outperforms traditional methods in generalization and cross-subject prediction.

Jun 16, 2026 1 source

AI-driven Landmark-free Assessment of Lower-limb Alignment with Implicit Neural Shape Functions from Knee Radiographs

Technology

Artificial Intelligence #knee radiographs#lower-limb alignment

AI-driven Landmark-free Assessment of Lower-limb Alignment with Implicit Neural Shape Functions from Knee Radiographs

Researchers propose a landmark-free automated workflow using Implicit Neural Shape Functions (INSF) to assess lower-limb alignment from knee radiographs. The method encodes anatomy into a compact latent space and regresses clinical measurements directly, achieving performance comparable to manual methods and state-of-the-art landmark-based approaches. Trained on 566 radiographs and tested on internal and external datasets, the approach offers flexibility for extension to new tasks.

Jun 16, 2026 1 source

Cortical Geometry and Wiring Serve as Powerful Inductive Biases for Recurrent Neural Networks

Technology

Artificial Intelligence #artificial intelligence#neural networks

Cortical Geometry and Wiring Serve as Powerful Inductive Biases for Recurrent Neural Networks

A new study leveraging the MICrONS functional connectomics dataset demonstrates that recurrent neural networks initialized with cortical geometry, wiring, and functional relationships consistently outperform baseline and partially constrained models across three decision-making tasks, achieving lower entropy and modular organization.

Jun 16, 2026 1 source

AI-Driven Career Guidance System Achieves 94.71% Accuracy in Predicting Student Paths

Technology

Artificial Intelligence #neural networks#student assessment

AI-Driven Career Guidance System Achieves 94.71% Accuracy in Predicting Student Paths

Researchers propose a real-time student assessment and career prediction system combining a Career Guidance Expert (CGE) with a web platform. The neural network model achieves 94.71% validation accuracy in recommending career paths for computing students.

Jun 16, 2026 1 source

Multiple Descents in Deep Learning Linked to Order-Chaos Transitions in LSTM Networks, New Research Shows

Technology

Artificial Intelligence #deep learning#lstm

Multiple Descents in Deep Learning Linked to Order-Chaos Transitions in LSTM Networks, New Research Shows

Researchers have observed a 'multiple-descent' phenomenon in LSTM networks, where test performance cycles through ups and downs after overtraining. Asymptotic stability analysis reveals these cycles are linked to order-chaos phase transitions, with the most optimal training step at the first transition from order to chaos, where the 'edge of chaos' is widest.

Jun 16, 2026 1 source

Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half

Technology

Artificial Intelligence #tied expert layers#mixture-of-experts

Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half

A new arXiv paper from Jaggi proposes Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers. Pretraining experiments show memory footprint reduction by almost 2x with virtually no degradation in perplexity or downstream quality, evaluated on OLMoE, Qwen3, and DeepSeek-style architectures.

Jun 16, 2026 1 source