iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
BRITE Benchmark Reveals Critical Gaps in Text-to-Video Models' Object-Action Binding and Audio-Visual Sync Vocabulary Dropout Technique Prevents Diversity Collapse in LLM Co-Evolution Training Bayesian Visualization Helps Humans Negotiate with AI Across Multiple Issues, Study Shows Multi-Sequence Verifiers Cut Inference Latency in Half for LLM Reasoning Language-Guided AI Framework CLARITY Boosts Road Scene Segmentation for Autonomous Logistics When RAG Hurts: Research Identifies Attention Distraction in Vision-Language AI Models and Proposes Mitigation Strait of Hormuz Reopening: Mine Clearance Delays Threaten Weeks-Long Recovery for Oil Shipping India’s REITs and InvITs May Attract Rs 11.6 Lakh Crore Investment by 2030, Avendus Report Says DualGauge: Automated Joint Security-Functionality Benchmarking of Specification-Only Code Generation by LLMs and Coding Agents Nimble SharePower: Modular Power Bank Lets You Share a Charge With a Friend BRITE Benchmark Reveals Critical Gaps in Text-to-Video Models' Object-Action Binding and Audio-Visual Sync Vocabulary Dropout Technique Prevents Diversity Collapse in LLM Co-Evolution Training Bayesian Visualization Helps Humans Negotiate with AI Across Multiple Issues, Study Shows Multi-Sequence Verifiers Cut Inference Latency in Half for LLM Reasoning Language-Guided AI Framework CLARITY Boosts Road Scene Segmentation for Autonomous Logistics When RAG Hurts: Research Identifies Attention Distraction in Vision-Language AI Models and Proposes Mitigation Strait of Hormuz Reopening: Mine Clearance Delays Threaten Weeks-Long Recovery for Oil Shipping India’s REITs and InvITs May Attract Rs 11.6 Lakh Crore Investment by 2030, Avendus Report Says DualGauge: Automated Joint Security-Functionality Benchmarking of Specification-Only Code Generation by LLMs and Coding Agents Nimble SharePower: Modular Power Bank Lets You Share a Charge With a Friend
Home ›› Technology ›› Ai ›› Computer Vision ›› AnchorEdit: Autoregressive Diffusion Tackles Identity Drift in Multi-Turn Image Editing

AnchorEdit: Autoregressive Diffusion Tackles Identity Drift in Multi-Turn Image Editing

Researchers propose AnchorEdit, the first autoregressive diffusion-based framework for multi-turn image editing, addressing identity drift and error accumulation via a three-stage training curriculum and a causal memory mechanism. The method achieves state-of-the-art subject fidelity and instruction following over extended editing trajectories.

iG
iGEN Editorial
June 16, 2026
AnchorEdit: Autoregressive Diffusion Tackles Identity Drift in Multi-Turn Image Editing

Multi-turn image editing is essential for iterative design workflows, but existing models suffer from identity drift and error accumulation as edits are applied sequentially. According to a recent arXiv paper by Xu, Hang; Ma, Xiaoxiao; Zhang, Guohui; Yu, Fu; Siming, Huang; Jie, Lin; Haoyang, Song; Nan, Duan; and Feng, Zhao, dated June 10, 2026, current approaches that leverage video priors rely on bidirectional attention, which is fundamentally misaligned with the causal, sequential nature of interactive editing. To address this, the researchers introduce AnchorEdit, the first autoregressive (AR) diffusion-based framework designed for high-resolution, long-term multi-turn editing.

The Challenge of Temporal Consistency

In iterative design—whether for product prototyping, marketing visuals, or architectural renderings—users often need to make multiple successive edits to an image while preserving the identity of key subjects. Existing models, including those based on video priors, gradually lose fidelity as changes accumulate. The paper notes that relying on bidirectional attention fails to account for the sequential order of edits, leading to inconsistencies. AnchorEdit bridges this gap between video priors and causal inference through a novel three-stage training curriculum and an inference-time memory mechanism.

AnchorEdit: A Novel Autoregressive Diffusion Framework

AnchorEdit is described as the first autoregressive diffusion-based framework for this task. Instead of processing all edits simultaneously, it treats each editing step as a causal, sequential update. The core innovation lies in its training recipe and inference strategy. During inference, a memory mechanism anchors the initial subject identity, ensuring stable extrapolation across extended editing trajectories. This allows the model to maintain consistency even over many rounds.

Three-Stage Training Curriculum

The training process consists of three distinct stages, each designed to prepare the model for long-horizon consistency:

Stage Purpose Method
1. Identity-preserving single-turn pretraining Enable the model to learn high-fidelity single-turn edits without drift Standard diffusion training on single editing steps
2. Causal AR forcing fine-tuning Teach the model to handle sequential dependencies and mitigate exposure bias Novel self-rollout strategy where the model generates its own context during training
3. Consistency distillation Speed up inference to four steps while preserving quality Distillation into a student model that generates high-quality outputs in fewer steps

According to the paper, this curriculum ensures that AnchorEdit can maintain subject fidelity and follow complex instructions even over 10+ interaction rounds.

Inference Memory Mechanism and Benchmarking

During inference, AnchorEdit employs a memory mechanism that persistently retains the initial subject identity from the first editing round. This prevents the model from 'forgetting' the original appearance as later transformations are applied. To evaluate performance, the authors introduce a new high-resolution multi-turn editing benchmark designed specifically to stress-test long-horizon stability. Extensive experiments, as reported in the paper, demonstrate that AnchorEdit achieves state-of-the-art results, with exceptional subject fidelity and instruction adherence across prolonged editing sequences.

The work represents a significant step forward for applications requiring iterative, high-quality image manipulation, such as product design, advertising, and content creation. By aligning the model's architecture with the causal nature of editing, AnchorEdit opens the door to more reliable and practical multi-turn editing tools.


Sources:

Keep Reading

Recommended Stories

SACE Framework Introduces First Scale-Aware Concept Erasure for Visual Autoregressive Models to Prevent Catastrophic Semantic Collapse Technology

SACE Framework Introduces First Scale-Aware Concept Erasure for Visual Autoregressive Models to Prevent Catastrophic Semantic Collapse

Researchers propose SACE, the first scale-aware concept erasure framework for visual autoregressive (VAR) models. It prevents catastrophic semantic collapse caused by naive application of erasure techniques from diffusion models. The framework introduces the Semantic Singularity Axiom and Incremental Semantic Saliency Analysis to surgically erase concepts with minimal overhead.

June 16, 2026
Language-Guided AI Framework CLARITY Boosts Road Scene Segmentation for Autonomous Logistics Technology

Language-Guided AI Framework CLARITY Boosts Road Scene Segmentation for Autonomous Logistics

Researchers propose CLARITY, a language-guided framework for RGB-Thermal semantic segmentation that dynamically adapts fusion strategies based on scene illumination. On the MFNet dataset, it achieves 62.3% mIoU and 77.5% mAcc, setting a new state-of-the-art for robust road scene understanding in autonomous driving, critical for logistics automation.

June 16, 2026
Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition Technology

Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition

A new research paper introduces MAND, a modality-aware framework for multimodal egocentric open-world continual learning. MAND addresses limitations of existing methods that underutilize IMU cues and suffer from catastrophic forgetting, leading to improved novelty detection and known-class accuracy on a public benchmark.

June 16, 2026
Phase, Not Magnitude, Drives Image Classifier Predictions, New Research Reveals Technology

Phase, Not Magnitude, Drives Image Classifier Predictions, New Research Reveals

A new study by Yıldırım tests whether image classifiers reproduce the Oppenheim-Lim phase dominance inside their hidden layers. By transplanting phase from one image to magnitude of another, the research finds that in architectures like ViT-B/16 and GFNet, predictions follow the phase donor, and removing image-specific magnitude barely affects accuracy. ResNet-50 exhibits a latent sign code before ReLU activation.

June 16, 2026