Phase, Not Magnitude, Drives Image Classifier Predictions, New Research Reveals

A new study by Yıldırım tests whether image classifiers reproduce the Oppenheim-Lim phase dominance inside their hidden layers. By transplanting phase from one image to magnitude of another, the research finds that in architectures like ViT-B/16 and GFNet, predictions follow the phase donor, and removing image-specific magnitude barely affects accuracy. ResNet-50 exhibits a latent sign code before ReLU activation.

iGEN Editorial

June 16, 2026

Phase, Not Magnitude, Drives Image Classifier Predictions, New Research Reveals

Enterprise technology decision-makers evaluating computer vision AI must understand what information neural networks actually use to make predictions. New research from Yıldırım, published on arXiv, reveals that image classifiers rely primarily on phase information in their hidden representations, while magnitude is largely dispensable. The finding has implications for how models process visual data and why some architectures behave differently.

The study builds on the classic Oppenheim and Lim (1981) result showing that natural images remain recognizable when reconstructed from Fourier phase alone. The researchers ask whether trained image classifiers reproduce this asymmetry inside their hidden layers and test it causally: given two images, they transplant the phase of one onto the magnitude of the other at a chosen layer and record which image the prediction follows.

Architectures Tested

The study examines four models: PRISM2D, GFNet, ViT-B/16, and ResNet-50. For PRISM2D, GFNet, and ViT-B/16, the prediction follows the phase or sign donor, and deleting all image-specific magnitude barely moves accuracy. This means identity rides on phase while image-specific magnitude is largely dispensable to the readout.

Architecture	Behavior	Key Insight
PRISM2D	Prediction follows phase donor	Phase dominance in hidden layers
GFNet	Prediction follows phase donor	Phase dominance in hidden layers
ViT-B/16	Prediction follows phase donor	Phase dominance in hidden layers
ResNet-50	Appears to break pattern; latent sign code before ReLU	Rectification and readout geometry expose phase code differently

ResNet-50's Latent Phase Code

ResNet-50 at first seems to break the pattern because transplanting sign after its ReLUs does nothing. However, a fair intervention before the ReLU reveals a strong latent sign code in the late blocks, and a DC-only control shows the readout consumes a channel-wise spatial average. Controls rule out the trivial case in which magnitude simply stops depending on the image. The architectures therefore share a phase/sign identity code but expose it in different bases, set by rectification and readout geometry.

Mechanistic Account of Texture–Shape Gap

The paper provides a mechanistic account of the texture–shape gap between CNNs and attention models. The differing exposure of the phase code explains why convolutional networks and transformer-based models behave differently when confronted with texture versus shape cues.

The prediction follows the phase or sign donor, and deleting all image-specific magnitude barely moves accuracy. — Study finding

Implications for Enterprise Computer Vision

For technology leaders deploying computer vision in applications such as quality inspection, autonomous navigation, or document digitization, understanding that phase information is primary means that models can potentially be made more robust by ensuring phase features are preserved during preprocessing or compression. Magnitude information, while not entirely useless, is less critical for the final classification. This insight could guide the design of more efficient neural architectures that focus computation on phase processing.

The research also underscores that seemingly similar architectures may encode information differently due to activation functions and readout mechanisms. When selecting a model for a specific visual task, enterprises should consider not just final accuracy but how the model internally represents features.

All architectures studied share a common phase/sign identity code, but rectification and readout geometry determine how that code is read. This understanding can help bridge the performance gap between CNNs and attention models in practical deployments.

Sources:

Phase, Not Magnitude, Drives Image Classifier Predictions, New Research Reveals

Architectures Tested

ResNet-50's Latent Phase Code

Mechanistic Account of Texture–Shape Gap

Implications for Enterprise Computer Vision

Recommended Stories

SceneConductor Generates 3D Scenes from Single Images Using Multi-Agent Orchestration

Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs

SACE Framework Introduces First Scale-Aware Concept Erasure for Visual Autoregressive Models to Prevent Catastrophic Semantic Collapse

AIRMap AI Framework Generates Radio Maps 100x Faster Than Ray Tracing for Wireless Digital Twins