iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Stop treating AI as the strategy — focus on business outcomes instead Beyond Text-to-SQL: New Agentic LLM System Governs Enterprise Analytics APIs Pruning Optimisations Boost LUT-Based Neural Network Scalability and Efficiency Tree-like Self-Play Framework Teaches LLMs to Fix Security Flaws in Code Generation Research Proposes Task-Based Neurons to Enhance Neural Network Feature Representation EV-WM: Event-Verified World Models Boost Long-Horizon Robotic Manipulation for Industrial Automation Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains 3D Skeleton Person Re-Identification Survey Reveals Taxonomy, Advances, and Interdisciplinary Potential FBI Seizes Drones at World Cup, Warns Pilots of Up to $100,000 Fines for Violating No-Fly Zones NVIDIA's GB10 Edge AI Hardware Has No CPU Energy Monitoring, Researchers Find Stop treating AI as the strategy — focus on business outcomes instead Beyond Text-to-SQL: New Agentic LLM System Governs Enterprise Analytics APIs Pruning Optimisations Boost LUT-Based Neural Network Scalability and Efficiency Tree-like Self-Play Framework Teaches LLMs to Fix Security Flaws in Code Generation Research Proposes Task-Based Neurons to Enhance Neural Network Feature Representation EV-WM: Event-Verified World Models Boost Long-Horizon Robotic Manipulation for Industrial Automation Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains 3D Skeleton Person Re-Identification Survey Reveals Taxonomy, Advances, and Interdisciplinary Potential FBI Seizes Drones at World Cup, Warns Pilots of Up to $100,000 Fines for Violating No-Fly Zones NVIDIA's GB10 Edge AI Hardware Has No CPU Energy Monitoring, Researchers Find
Home ›› Technology ›› Ai ›› Computer Vision ›› Phase, Not Magnitude, Drives Image Classifier Predictions, New Research Reveals

Phase, Not Magnitude, Drives Image Classifier Predictions, New Research Reveals

A new study by Yıldırım tests whether image classifiers reproduce the Oppenheim-Lim phase dominance inside their hidden layers. By transplanting phase from one image to magnitude of another, the research finds that in architectures like ViT-B/16 and GFNet, predictions follow the phase donor, and removing image-specific magnitude barely affects accuracy. ResNet-50 exhibits a latent sign code before ReLU activation.

iG
iGEN Editorial
June 16, 2026
Phase, Not Magnitude, Drives Image Classifier Predictions, New Research Reveals

Enterprise technology decision-makers evaluating computer vision AI must understand what information neural networks actually use to make predictions. New research from Yıldırım, published on arXiv, reveals that image classifiers rely primarily on phase information in their hidden representations, while magnitude is largely dispensable. The finding has implications for how models process visual data and why some architectures behave differently.

The study builds on the classic Oppenheim and Lim (1981) result showing that natural images remain recognizable when reconstructed from Fourier phase alone. The researchers ask whether trained image classifiers reproduce this asymmetry inside their hidden layers and test it causally: given two images, they transplant the phase of one onto the magnitude of the other at a chosen layer and record which image the prediction follows.

Architectures Tested

The study examines four models: PRISM2D, GFNet, ViT-B/16, and ResNet-50. For PRISM2D, GFNet, and ViT-B/16, the prediction follows the phase or sign donor, and deleting all image-specific magnitude barely moves accuracy. This means identity rides on phase while image-specific magnitude is largely dispensable to the readout.

Architecture Behavior Key Insight
PRISM2D Prediction follows phase donor Phase dominance in hidden layers
GFNet Prediction follows phase donor Phase dominance in hidden layers
ViT-B/16 Prediction follows phase donor Phase dominance in hidden layers
ResNet-50 Appears to break pattern; latent sign code before ReLU Rectification and readout geometry expose phase code differently

ResNet-50's Latent Phase Code

ResNet-50 at first seems to break the pattern because transplanting sign after its ReLUs does nothing. However, a fair intervention before the ReLU reveals a strong latent sign code in the late blocks, and a DC-only control shows the readout consumes a channel-wise spatial average. Controls rule out the trivial case in which magnitude simply stops depending on the image. The architectures therefore share a phase/sign identity code but expose it in different bases, set by rectification and readout geometry.

Mechanistic Account of Texture–Shape Gap

The paper provides a mechanistic account of the texture–shape gap between CNNs and attention models. The differing exposure of the phase code explains why convolutional networks and transformer-based models behave differently when confronted with texture versus shape cues.

The prediction follows the phase or sign donor, and deleting all image-specific magnitude barely moves accuracy. — Study finding

Implications for Enterprise Computer Vision

For technology leaders deploying computer vision in applications such as quality inspection, autonomous navigation, or document digitization, understanding that phase information is primary means that models can potentially be made more robust by ensuring phase features are preserved during preprocessing or compression. Magnitude information, while not entirely useless, is less critical for the final classification. This insight could guide the design of more efficient neural architectures that focus computation on phase processing.

The research also underscores that seemingly similar architectures may encode information differently due to activation functions and readout mechanisms. When selecting a model for a specific visual task, enterprises should consider not just final accuracy but how the model internally represents features.

All architectures studied share a common phase/sign identity code, but rectification and readout geometry determine how that code is read. This understanding can help bridge the performance gap between CNNs and attention models in practical deployments.


Sources:

Keep Reading

Recommended Stories

SceneConductor Generates 3D Scenes from Single Images Using Multi-Agent Orchestration Technology

SceneConductor Generates 3D Scenes from Single Images Using Multi-Agent Orchestration

Researchers propose SceneConductor, a multi-agent orchestration framework that decomposes single-image 3D scene generation into three structured stages: initialization, environment construction, and refinement. It also introduces a geometry-aware layout predictor to reduce reliance on scene-level annotations. Experiments show it consistently outperforms prior approaches in geometric accuracy, spatial consistency, and perceptual realism.

June 16, 2026
Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs Technology

Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs

Researchers introduce cascaded sparse autoencoders (CSAEs) that learn hierarchical visual concepts in multimodal large language models. By training a second-level SAE on the decoder weights of the first, CSAEs achieve 'concepts of concepts' without nesting or stacking bottlenecks. Experiments on Qwen3-VL, Gemma-3, and LLaVA show improved interpretability and effective group-level steering.

June 16, 2026
SACE Framework Introduces First Scale-Aware Concept Erasure for Visual Autoregressive Models to Prevent Catastrophic Semantic Collapse Technology

SACE Framework Introduces First Scale-Aware Concept Erasure for Visual Autoregressive Models to Prevent Catastrophic Semantic Collapse

Researchers propose SACE, the first scale-aware concept erasure framework for visual autoregressive (VAR) models. It prevents catastrophic semantic collapse caused by naive application of erasure techniques from diffusion models. The framework introduces the Semantic Singularity Axiom and Incremental Semantic Saliency Analysis to surgically erase concepts with minimal overhead.

June 16, 2026
AIRMap AI Framework Generates Radio Maps 100x Faster Than Ray Tracing for Wireless Digital Twins Technology

AIRMap AI Framework Generates Radio Maps 100x Faster Than Ray Tracing for Wireless Digital Twins

Researchers propose AIRMap, a deep-learning framework that generates radio maps from a 2D elevation map in 4 ms, over 100x faster than GPU-accelerated ray tracing. Trained on 1.2M Boston-area samples, it predicts path gain with under 4 dB RMSE. Integration into Colosseum and Sionna SYS shows near-zero error in spectral efficiency compared to measurement-based channels.

June 16, 2026