iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Trump Lets Sanctions Waiver on Russian Crude Expire as US-Iran Peace Deal Progresses Iran-US Peace Deal Reopens Hormuz: 62 Million Barrels Set to Flood Market, Asia Braces for Oil Glut Vår Energi Approves Seven-Well North Sea Development with 2027 Start-Up Atom XVII Launches ₹75 Crore Consumer Fund to Back Early-Stage Indian Brands Rupee Tumbles 21 Paise to 94.66 Against US Dollar on Fed Hawkish Stance MOL and NYK Sign Long-Term Ammonia Carrier Charters with JERA for US-Japan Low-Carbon Fuel Supply Qatar LNG Tanker Sails for Hormuz as US-Iran Deal Reopens Critical Waterway UK to Scan Asylum-Seekers’ Faces with Flawed AI Age Estimation Despite Internal Warnings US Firms Sue Container Makers Over Alleged Price-Fixing Scheme Impacting Global Dry Container Market Strait of Hormuz Reopens Under US-Iran Deal, Future Transit Fees Uncertain for Shippers Trump Lets Sanctions Waiver on Russian Crude Expire as US-Iran Peace Deal Progresses Iran-US Peace Deal Reopens Hormuz: 62 Million Barrels Set to Flood Market, Asia Braces for Oil Glut Vår Energi Approves Seven-Well North Sea Development with 2027 Start-Up Atom XVII Launches ₹75 Crore Consumer Fund to Back Early-Stage Indian Brands Rupee Tumbles 21 Paise to 94.66 Against US Dollar on Fed Hawkish Stance MOL and NYK Sign Long-Term Ammonia Carrier Charters with JERA for US-Japan Low-Carbon Fuel Supply Qatar LNG Tanker Sails for Hormuz as US-Iran Deal Reopens Critical Waterway UK to Scan Asylum-Seekers’ Faces with Flawed AI Age Estimation Despite Internal Warnings US Firms Sue Container Makers Over Alleged Price-Fixing Scheme Impacting Global Dry Container Market Strait of Hormuz Reopens Under US-Iran Deal, Future Transit Fees Uncertain for Shippers
Home ›› Topics ›› neural networks

Topic

neural networks

28 stories
FastMix: Gradient-Based Data Mixture Optimization Reduces Search Cost in AI Training Technology
Artificial Intelligence #fastmix#data mixture

FastMix: Gradient-Based Data Mixture Optimization Reduces Search Cost in AI Training

FastMix is a novel framework that automates data mixture discovery by training only a single proxy model and jointly optimizing mixture coefficients and model parameters via gradient descent. It reformulates mixture selection as a bilevel optimization problem, enabling efficient, scalable optimization that outperforms baselines.

Jun 17, 2026 1 source
Norm-Agnostic Residual Networks Offer Path to Scaling Adaptive Depth in Deep Learning Technology
Artificial Intelligence #artificial intelligence#residual networks

Norm-Agnostic Residual Networks Offer Path to Scaling Adaptive Depth in Deep Learning

Researchers introduce NAG, a norm-agnostic residual architecture that prevents later layers from being suppressed by norm growth. This enables training of much deeper models and introduces an interpretable Mixture-of-Depths mechanism that can serve as a pretraining scaling strategy, with 20-25% sparsity matching full-depth baseline under equal compute.

Jun 17, 2026 1 source
Lightweight Attention Mechanism Boosts Robust Multimodal Integration in Global Workspace Architecture Technology
Artificial Intelligence #attention mechanism#multimodal integration

Lightweight Attention Mechanism Boosts Robust Multimodal Integration in Global Workspace Architecture

A new arXiv paper introduces a lightweight attention mechanism for multimodal integration in a global workspace architecture. The method improves robustness against corrupted modalities while using far fewer trainable parameters than end-to-end attention baselines. Tests on Simple Shapes and MM-IMDb 1.0 show transferable selection strategies across tasks and unseen modalities.

Jun 17, 2026 1 source
Neural Audio Codecs' Low Frame Rate Degradation Linked to Training Configuration Technology
Artificial Intelligence #neural audio codecs#low frame rate

Neural Audio Codecs' Low Frame Rate Degradation Linked to Training Configuration

A new study by Gichamba and Busogi investigates the mechanisms behind low frame rate degradation in neural audio codecs. The researchers found that a quality cliff at 6.25 Hz is caused by suboptimal training configuration, not by phonemic collisions or codebook saturation. After correcting the training setup, the codecs perform smoothly down to 3.1 Hz and 1.6 Hz, suggesting that low frame rate efficiency gains are more accessible than previously assumed.

Jun 17, 2026 1 source
Epileptic Seizure Detection via Frequency-Aware Graph Convolutional Networks Achieves 99% Accuracy Technology
Artificial Intelligence #epileptic seizure detection#eeg signals

Epileptic Seizure Detection via Frequency-Aware Graph Convolutional Networks Achieves 99% Accuracy

A research team has developed a frequency-aware framework for epileptic seizure detection using EEG signals. By decomposing signals into five frequency bands and applying a graph convolutional neural network (GCN), the method achieves up to 99.7% accuracy on specific bands and an overall broadband accuracy of 99.01% on the CHB-MIT dataset, while enhancing neurophysiological interpretability.

Jun 17, 2026 1 source
Pruning Optimisations Boost LUT-Based Neural Network Scalability and Efficiency Technology
Artificial Intelligence #neural networks#pruning

Pruning Optimisations Boost LUT-Based Neural Network Scalability and Efficiency

Researchers propose a pruning-optimised Look-Up Table (LUT) matrix multiplication unit (LUT-MU) to address scalability limits in LUT-based neural networks. Deployed on FPGAs, it delivers up to 1.6x throughput improvement and 4.2x energy efficiency gains over CUDA-based implementations, with 1.3 to 2.6x resource savings versus original MADDNESS-based networks.

Jun 16, 2026 1 source
Research Proposes Task-Based Neurons to Enhance Neural Network Feature Representation Technology
Artificial Intelligence #artificial intelligence#neural networks

Research Proposes Task-Based Neurons to Enhance Neural Network Feature Representation

A study published on arXiv introduces a framework for designing task-based neurons inspired by the human brain's neuronal diversity. Using polynomials as base functions, experiments on synthetic data, classic benchmarks, and real-world applications demonstrate competitive performance against state-of-the-art models.

Jun 16, 2026 1 source
Gated QKAN-FWP: Quantum-Inspired Sequence Learning Achieves Parameter Efficiency on NISQ Devices Technology
Artificial Intelligence #quantum-inspired#sequence learning

Gated QKAN-FWP: Quantum-Inspired Sequence Learning Achieves Parameter Efficiency on NISQ Devices

A new quantum-inspired sequence learning model, Gated QKAN-FWP, uses single-qubit data re-uploading circuits to achieve high accuracy with only 12,500 parameters on long-horizon forecasting tasks. The model outperforms classical recurrent networks such as LSTM and WaveNet-LSTM while being deployable on current NISQ quantum hardware from IonQ and IBM.

Jun 16, 2026 1 source
Learned Image Compression Framework SPARC Boosts VLA Robot Control Performance in Bandwidth-Limited Settings Technology
Artificial Intelligence #learned image compression#vision-language-action models

Learned Image Compression Framework SPARC Boosts VLA Robot Control Performance in Bandwidth-Limited Settings

Researchers introduce SPARC (SPatially Adaptive Rate Control), a learned image compression framework tailored for vision-language-action (VLA) models. SPARC adaptively allocates bitrate based on task relevance and uses a tilted rate loss to preserve critical visual patterns. Experiments on robotic benchmarks RoboCasa365, VLABench, and LIBERO show SPARC achieves stronger control performance than conventional codecs at the same bitrate, with real-world benefits for remote robot control.

Jun 16, 2026 1 source
Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs Technology
Artificial Intelligence #cascaded sparse autoencoders#multimodal llms

Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs

Researchers introduce cascaded sparse autoencoders (CSAEs) that learn hierarchical visual concepts in multimodal large language models. By training a second-level SAE on the decoder weights of the first, CSAEs achieve 'concepts of concepts' without nesting or stacking bottlenecks. Experiments on Qwen3-VL, Gemma-3, and LLaVA show improved interpretability and effective group-level steering.

Jun 16, 2026 1 source
PURe Module Enhances Vision Networks by Adding Multiplicative Local Interactions Technology
Artificial Intelligence #plug-and-play#product-unit

PURe Module Enhances Vision Networks by Adding Multiplicative Local Interactions

Researchers propose PURe, a Product-Unit Residual Module that introduces explicit multiplicative local interactions into deep vision networks. The module serves as a drop-in replacement for native residual units, consistently improving performance on benchmarks like ImageNet and CIFAR-10 while using smaller parameter budgets.

Jun 16, 2026 1 source
New Unified Definition of AI Hallucination Pins It on Inaccurate World Modeling Technology
Artificial Intelligence #hallucination#artificial intelligence

New Unified Definition of AI Hallucination Pins It on Inaccurate World Modeling

A new arXiv paper by Liu et al. proposes a unified definition of hallucination in large language models, defining it as inaccurate internal world modeling observable to the user. The framework subsumes prior definitions and distinguishes true hallucinations from planning or reward errors, and introduces the HalluWorld benchmark for stress-testing models.

Jun 16, 2026 1 source
Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation Technology
Artificial Intelligence #z-plane neural networks#bounded geometric activation

Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation

Researchers propose Z-Plane Neural Networks, which replace traditional ReLU activations and LayerNorm with a bounded geometric activation called Radial Bounding. This new approach maintains 1-Lipschitz continuity, prevents gradient vanishing, and preserves directional information. A 100-layer Z-Plane MLP achieved 98.34% accuracy on MNIST without any ReLU or LayerNorm, demonstrating numerical stability.

Jun 16, 2026 1 source
New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks Technology
Artificial Intelligence #gradient descent#recurrent networks

New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks

Researchers introduce the Gradient-based Recurrent In-context Learner (GRIL), a linear recurrent network architecture with windowed cross-product self-attention that can implement minibatch gradient descent on a task-specific predictor in a single forward pass. The design achieves strong performance on synthetic in-context learning tasks, Long Range Arena, and language modeling.

Jun 16, 2026 1 source
New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders Technology
Artificial Intelligence #transformers#representation autoencoders

New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders

A new research paper proposes Drift-RAE, a method for distilling pretrained flow models in representation autoencoder latent spaces. It overcomes anisotropy and large curvature challenges, achieving 1.77 FID on ImageNet 256 with only 10,000 distillation steps, outperforming existing RAE distillation methods.

Jun 16, 2026 1 source
New Research Demystifies Variance in Circuit Discovery of Large Language Models Technology
Artificial Intelligence #llms#circuit discovery

New Research Demystifies Variance in Circuit Discovery of Large Language Models

A new research paper explores variance in circuit discovery of large language models, identifying resampling, rephrasing, and sample-wise variance. The authors propose CEAP, an improved method over EAP-IG with theoretical guarantees, and argue that rephrasing variance makes it hard to find comprehensive circuits, suggesting LLMs may be inherently difficult to steer.

Jun 16, 2026 1 source
New Theory Explains How Deep Transformers Achieve Adaptive Inference Using Function Vectors Technology
Artificial Intelligence #artificial intelligence#deep learning

New Theory Explains How Deep Transformers Achieve Adaptive Inference Using Function Vectors

A new research paper introduces a theory of deep transformers as mean-field interacting systems that implement distributed inference using 'function vectors' to adaptively infer latent context variables at finer scales over layers. The theory predicts a relationship between non-Gaussian hierarchical structure and transformer depth, tested with constrained linear attention models.

Jun 16, 2026 1 source
Lossy Compression Slashes Storage 39x for Neural Surrogate Models, Study Finds Technology
Artificial Intelligence #lossy compression#neural networks

Lossy Compression Slashes Storage 39x for Neural Surrogate Models, Study Finds

A new study quantifies the impact of lossy compression on neural generative surrogate models, finding that storage can be reduced by up to 39x and training time by up to 3x with negligible effect on model quality, offering a path to more efficient AI training in data-intensive domains.

Jun 16, 2026 1 source
Pixel-TTS: Image-Based Text Rendering Improves Robustness in Speech Synthesis Technology
Artificial Intelligence #text-to-speech#artificial intelligence

Pixel-TTS: Image-Based Text Rendering Improves Robustness in Speech Synthesis

Researchers propose Pixel-TTS, the first visually grounded text-to-speech framework that renders text as images and processes them with 2D convolutions. This eliminates embedding matrix expansion during fine-tuning and improves robustness to unseen characters and orthographic variations. Experiments show competitive performance with faster convergence and zero-shot generalization.

Jun 16, 2026 1 source
EEGNet Study Reveals Key Limitations in fNIRS Cognitive Load Classification Technology
Artificial Intelligence #eegnet#fnirs

EEGNet Study Reveals Key Limitations in fNIRS Cognitive Load Classification

A comprehensive study published on arXiv systematically evaluates EEGNet for classifying cognitive load from fNIRS signals. The research highlights critical challenges in generalization, achieving only 56.11% accuracy under subject-independent evaluation, and underscores the importance of segmentation strategy and learning rate selection.

Jun 16, 2026 1 source
Sub-Quadratic Vision Transformers Cut Self-Attention Cost for Faster Image Captioning Technology
Artificial Intelligence #vision transformers#image captioning

Sub-Quadratic Vision Transformers Cut Self-Attention Cost for Faster Image Captioning

A new arXiv preprint from Ghosh et al. proposes a sub-quadratic vision transformer architecture for image captioning. By replacing standard self-attention with a Gaussian Mixture Model (GMM) clustering mechanism, the model reduces computational complexity from quadratic O(n²) to linear O(nK). The approach uses an autoregressive GPT-based decoder and achieves competitive results on the Flickr30K dataset.

Jun 16, 2026 1 source
AI Safety Monitors May Fail After Model Updates, New Benchmarking Study Finds Technology
Artificial Intelligence #ai safety#model monitoring

AI Safety Monitors May Fail After Model Updates, New Benchmarking Study Finds

A new research paper presents the first systematic test of whether activation monitors remain reliable after common model updates such as quantization and fine-tuning. The study finds that while quantization largely preserves performance, fine-tuning frequently makes monitors stale, with privacy monitors most affected. Degradation is predictable, enabling triaged revalidation.

Jun 16, 2026 1 source
Multi-Encoder-Decoder VAE Enables Cross-Subject Neural Alignment Without Shared Stimuli Technology
Artificial Intelligence #artificial intelligence#machine learning

Multi-Encoder-Decoder VAE Enables Cross-Subject Neural Alignment Without Shared Stimuli

A new Multi-Encoder-Decoder Variational Autoencoder (MED-VAE) achieves cross-subject alignment of neural activity without shared stimuli by using a pretrained artificial neural network as a scaffold. Tested on the Natural Scenes Dataset, MED-VAE creates semantically organized common latent spaces and outperforms traditional methods in generalization and cross-subject prediction.

Jun 16, 2026 1 source
AI-driven Landmark-free Assessment of Lower-limb Alignment with Implicit Neural Shape Functions from Knee Radiographs Technology
Artificial Intelligence #knee radiographs#lower-limb alignment

AI-driven Landmark-free Assessment of Lower-limb Alignment with Implicit Neural Shape Functions from Knee Radiographs

Researchers propose a landmark-free automated workflow using Implicit Neural Shape Functions (INSF) to assess lower-limb alignment from knee radiographs. The method encodes anatomy into a compact latent space and regresses clinical measurements directly, achieving performance comparable to manual methods and state-of-the-art landmark-based approaches. Trained on 566 radiographs and tested on internal and external datasets, the approach offers flexibility for extension to new tasks.

Jun 16, 2026 1 source
Cortical Geometry and Wiring Serve as Powerful Inductive Biases for Recurrent Neural Networks Technology
Artificial Intelligence #artificial intelligence#neural networks

Cortical Geometry and Wiring Serve as Powerful Inductive Biases for Recurrent Neural Networks

A new study leveraging the MICrONS functional connectomics dataset demonstrates that recurrent neural networks initialized with cortical geometry, wiring, and functional relationships consistently outperform baseline and partially constrained models across three decision-making tasks, achieving lower entropy and modular organization.

Jun 16, 2026 1 source
AI-Driven Career Guidance System Achieves 94.71% Accuracy in Predicting Student Paths Technology
Artificial Intelligence #neural networks#student assessment

AI-Driven Career Guidance System Achieves 94.71% Accuracy in Predicting Student Paths

Researchers propose a real-time student assessment and career prediction system combining a Career Guidance Expert (CGE) with a web platform. The neural network model achieves 94.71% validation accuracy in recommending career paths for computing students.

Jun 16, 2026 1 source
Multiple Descents in Deep Learning Linked to Order-Chaos Transitions in LSTM Networks, New Research Shows Technology
Artificial Intelligence #deep learning#lstm

Multiple Descents in Deep Learning Linked to Order-Chaos Transitions in LSTM Networks, New Research Shows

Researchers have observed a 'multiple-descent' phenomenon in LSTM networks, where test performance cycles through ups and downs after overtraining. Asymptotic stability analysis reveals these cycles are linked to order-chaos phase transitions, with the most optimal training step at the first transition from order to chaos, where the 'edge of chaos' is widest.

Jun 16, 2026 1 source
Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half Technology
Artificial Intelligence #tied expert layers#mixture-of-experts

Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half

A new arXiv paper from Jaggi proposes Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers. Pretraining experiments show memory footprint reduction by almost 2x with virtually no degradation in perplexity or downstream quality, evaluated on OLMoE, Qwen3, and DeepSeek-style architectures.

Jun 16, 2026 1 source