iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Strait of Hormuz Reopening: Mine Clearance Delays Threaten Weeks-Long Recovery for Oil Shipping India’s REITs and InvITs May Attract Rs 11.6 Lakh Crore Investment by 2030, Avendus Report Says DualGauge: Automated Joint Security-Functionality Benchmarking of Specification-Only Code Generation by LLMs and Coding Agents Nimble SharePower: Modular Power Bank Lets You Share a Charge With a Friend OBCache Prunes KV Cache for Efficient Long-Context LLM Inference with Output-Aware Scoring 'Dangerous' AI Models: Enterprise Leaders Must Prepare for Broad Availability Air India Launches 'Basic Fare' Option Without Complimentary Meals on Select Domestic Flights New Survey Maps How Evidence Tracing and Execution Provenance Can Make LLM Agents Trustworthy New Unifying Lens for Learning to Hash Could Cut Memory Costs in Large-Scale Retrieval Mosaic: Data-Free Knowledge Distillation Framework Uses Mixture-of-Experts to Tackle Heterogeneous Federated Learning Strait of Hormuz Reopening: Mine Clearance Delays Threaten Weeks-Long Recovery for Oil Shipping India’s REITs and InvITs May Attract Rs 11.6 Lakh Crore Investment by 2030, Avendus Report Says DualGauge: Automated Joint Security-Functionality Benchmarking of Specification-Only Code Generation by LLMs and Coding Agents Nimble SharePower: Modular Power Bank Lets You Share a Charge With a Friend OBCache Prunes KV Cache for Efficient Long-Context LLM Inference with Output-Aware Scoring 'Dangerous' AI Models: Enterprise Leaders Must Prepare for Broad Availability Air India Launches 'Basic Fare' Option Without Complimentary Meals on Select Domestic Flights New Survey Maps How Evidence Tracing and Execution Provenance Can Make LLM Agents Trustworthy New Unifying Lens for Learning to Hash Could Cut Memory Costs in Large-Scale Retrieval Mosaic: Data-Free Knowledge Distillation Framework Uses Mixture-of-Experts to Tackle Heterogeneous Federated Learning
Home ›› Technology ›› Ai ›› Computer Vision ›› Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition

Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition

A new research paper introduces MAND, a modality-aware framework for multimodal egocentric open-world continual learning. MAND addresses limitations of existing methods that underutilize IMU cues and suffer from catastrophic forgetting, leading to improved novelty detection and known-class accuracy on a public benchmark.

iG
iGEN Editorial
June 16, 2026
Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition

Multimodal egocentric activity recognition, which combines visual and inertial cues to understand first-person behavior, faces significant hurdles when deployed in open-world environments. According to a paper on arXiv, existing methods struggle to detect activities never seen before while continuously learning from non-stationary data streams. The authors propose MAND (Modality-Aware Novelty Detection), a framework that adaptively leverages complementary evidence from multiple modalities to improve reliability.

The Problem with Existing Approaches

Traditional multimodal systems rely on the main fused logits for novelty scoring, according to the paper. This approach fails to fully exploit the complementary evidence available from individual modalities. Because these logits are often dominated by RGB, cues from other modalities—particularly IMU (inertial measurement unit)—remain underutilized. The paper notes that this imbalance worsens as catastrophic forgetting accumulates, where neural networks overwrite previously learned knowledge when integrating new tasks.

MAND: Dual Mechanism for Adaptive Learning

MAND introduces two key components. At inference, the Modality-aware Adaptive Scoring (MoAS) mechanism adaptively adjusts modality contributions using sample-wise reliability. It refines novelty scoring with deviation and disagreement penalties, ensuring that less reliable modalities are downweighted. During training, Modality-aware Representation Stabilization Training (MoRST) preserves the discriminative capacity of each modality across tasks. This is achieved through modality-specific heads and modality-wise logit distillation, preventing catastrophic forgetting.

Experimental Results

The authors tested MAND on a public multimodal egocentric benchmark. The results show that MAND consistently improves novel activity detection and known-class accuracy while substantially reducing FPR95 (false positive rate at 95% recall). This indicates more reliable open-world recognition compared to existing methods. The source code is publicly available at the link in the paper.

Metric Existing Methods MAND
Novel activity detection Baseline Improved
Known-class accuracy Baseline Improved
FPR95 Higher Substantially reduced

The research was conducted by Im, Hyejeong; Lim, Wonseon; and Kim, Dae-Won. The paper is titled "MAND: Modality-Aware Novelty Detection for Open-World Egocentric Activity Recognition."

Implications for Enterprise AI

While the research is academic, the ability to detect novel activities in first-person video with multimodal data has relevance for enterprise systems that require anomaly detection, such as monitoring worker actions in manufacturing or logistics. The MAND framework's focus on robustness and adaptability aligns with the needs of open-world deployments where unseen events must be detected reliably without manual retraining.

The publication on arXiv and the availability of source code enable further exploration and adoption by the research community.


Sources:

Keep Reading

Recommended Stories

Akasha 2 Achieves 4x Faster Visual Synthesis with Hamiltonian-Inspired AI Architecture Technology

Akasha 2 Achieves 4x Faster Visual Synthesis with Hamiltonian-Inspired AI Architecture

Akasha 2 introduces Hamiltonian State Space Duality and Visual-Language Joint Embedding Predictive Architecture, achieving state-of-the-art video prediction with 4x faster synthesis than diffusion models and 3-18x speedup over transformers. The system enforces physical conservation laws for spatiotemporal coherence.

June 16, 2026
AnchorEdit: Autoregressive Diffusion Tackles Identity Drift in Multi-Turn Image Editing Technology

AnchorEdit: Autoregressive Diffusion Tackles Identity Drift in Multi-Turn Image Editing

Researchers propose AnchorEdit, the first autoregressive diffusion-based framework for multi-turn image editing, addressing identity drift and error accumulation via a three-stage training curriculum and a causal memory mechanism. The method achieves state-of-the-art subject fidelity and instruction following over extended editing trajectories.

June 16, 2026
UniT Framework Enables Multimodal Chain-of-Thought Test-Time Scaling for AI Reasoning Technology

UniT Framework Enables Multimodal Chain-of-Thought Test-Time Scaling for AI Reasoning

UniT introduces a framework for unified multimodal models to perform chain-of-thought reasoning at test time, enabling iterative verification and refinement. Key findings show that sequential reasoning is more compute-efficient than parallel sampling and that training on generation/editing trajectories improves out-of-distribution visual reasoning.

June 16, 2026
SACE Framework Introduces First Scale-Aware Concept Erasure for Visual Autoregressive Models to Prevent Catastrophic Semantic Collapse Technology

SACE Framework Introduces First Scale-Aware Concept Erasure for Visual Autoregressive Models to Prevent Catastrophic Semantic Collapse

Researchers propose SACE, the first scale-aware concept erasure framework for visual autoregressive (VAR) models. It prevents catastrophic semantic collapse caused by naive application of erasure techniques from diffusion models. The framework introduces the Semantic Singularity Axiom and Incremental Semantic Saliency Analysis to surgically erase concepts with minimal overhead.

June 16, 2026