Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition

A new research paper introduces MAND, a modality-aware framework for multimodal egocentric open-world continual learning. MAND addresses limitations of existing methods that underutilize IMU cues and suffer from catastrophic forgetting, leading to improved novelty detection and known-class accuracy on a public benchmark.

iGEN Editorial

June 16, 2026

Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition

Multimodal egocentric activity recognition, which combines visual and inertial cues to understand first-person behavior, faces significant hurdles when deployed in open-world environments. According to a paper on arXiv, existing methods struggle to detect activities never seen before while continuously learning from non-stationary data streams. The authors propose MAND (Modality-Aware Novelty Detection), a framework that adaptively leverages complementary evidence from multiple modalities to improve reliability.

The Problem with Existing Approaches

Traditional multimodal systems rely on the main fused logits for novelty scoring, according to the paper. This approach fails to fully exploit the complementary evidence available from individual modalities. Because these logits are often dominated by RGB, cues from other modalities—particularly IMU (inertial measurement unit)—remain underutilized. The paper notes that this imbalance worsens as catastrophic forgetting accumulates, where neural networks overwrite previously learned knowledge when integrating new tasks.

MAND: Dual Mechanism for Adaptive Learning

MAND introduces two key components. At inference, the Modality-aware Adaptive Scoring (MoAS) mechanism adaptively adjusts modality contributions using sample-wise reliability. It refines novelty scoring with deviation and disagreement penalties, ensuring that less reliable modalities are downweighted. During training, Modality-aware Representation Stabilization Training (MoRST) preserves the discriminative capacity of each modality across tasks. This is achieved through modality-specific heads and modality-wise logit distillation, preventing catastrophic forgetting.

Experimental Results

The authors tested MAND on a public multimodal egocentric benchmark. The results show that MAND consistently improves novel activity detection and known-class accuracy while substantially reducing FPR95 (false positive rate at 95% recall). This indicates more reliable open-world recognition compared to existing methods. The source code is publicly available at the link in the paper.

Metric	Existing Methods	MAND
Novel activity detection	Baseline	Improved
Known-class accuracy	Baseline	Improved
FPR95	Higher	Substantially reduced

The research was conducted by Im, Hyejeong; Lim, Wonseon; and Kim, Dae-Won. The paper is titled "MAND: Modality-Aware Novelty Detection for Open-World Egocentric Activity Recognition."

Implications for Enterprise AI

While the research is academic, the ability to detect novel activities in first-person video with multimodal data has relevance for enterprise systems that require anomaly detection, such as monitoring worker actions in manufacturing or logistics. The MAND framework's focus on robustness and adaptability aligns with the needs of open-world deployments where unseen events must be detected reliably without manual retraining.

The publication on arXiv and the availability of source code enable further exploration and adoption by the research community.

Sources:

Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition

The Problem with Existing Approaches

MAND: Dual Mechanism for Adaptive Learning

Experimental Results

Implications for Enterprise AI

Recommended Stories

Akasha 2 Achieves 4x Faster Visual Synthesis with Hamiltonian-Inspired AI Architecture

New Research Reveals How Visual Tokens Evolve Inside Vision-Language Models

New Framework GeoVR Learns 3D Spatial Intelligence from 2D Videos for Multimodal LLMs

CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation