iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Neuro-Symbolic Framework Improves Motion Prediction for Autonomous Vehicles in Mixed Traffic AI Scientist Automates Entire Research Lifecycle, Passes First Peer Review AI-driven Landmark-free Assessment of Lower-limb Alignment with Implicit Neural Shape Functions from Knee Radiographs Quantum Machine Learning for Industrial Applications: New Research Tackles Trainability and Expressivity New Method Resolves Drift Attribution Ambiguity in LLM Evaluation Pipelines New Hardware-Aware Neural Architecture Search Runs on Embedded Devices with Under 512MB RAM Malaysia's AI Agent-Powered Messaging Platform Respond.io Raises $62.5M, Targets Acquisitions MimicIK Framework Achieves Real-Time Inverse Kinematics with 4.65 mm Accuracy for Robotic Teleoperation Reward Hacking Still Undefeated: AI Safety Gridworlds Test Shows Exploits Persist Across LLM Scales Hormuz Threat Level Stays Severe Despite Peace Breakthrough as Explosions and Uncertainty Persist Neuro-Symbolic Framework Improves Motion Prediction for Autonomous Vehicles in Mixed Traffic AI Scientist Automates Entire Research Lifecycle, Passes First Peer Review AI-driven Landmark-free Assessment of Lower-limb Alignment with Implicit Neural Shape Functions from Knee Radiographs Quantum Machine Learning for Industrial Applications: New Research Tackles Trainability and Expressivity New Method Resolves Drift Attribution Ambiguity in LLM Evaluation Pipelines New Hardware-Aware Neural Architecture Search Runs on Embedded Devices with Under 512MB RAM Malaysia's AI Agent-Powered Messaging Platform Respond.io Raises $62.5M, Targets Acquisitions MimicIK Framework Achieves Real-Time Inverse Kinematics with 4.65 mm Accuracy for Robotic Teleoperation Reward Hacking Still Undefeated: AI Safety Gridworlds Test Shows Exploits Persist Across LLM Scales Hormuz Threat Level Stays Severe Despite Peace Breakthrough as Explosions and Uncertainty Persist
Home ›› Topics ›› multimodal

Topic

multimodal

7 stories
New Attack Forces Costly Model Usage in Multimodal LLM Cascades Technology
Artificial Intelligence #ai#llm

New Attack Forces Costly Model Usage in Multimodal LLM Cascades

A research paper introduces the Forced Deferral Attack (FDA), which manipulates confidence thresholds in multimodal large language model cascades, causing queries to be routed to more expensive strong models. The attack raises security concerns for enterprises deploying cost-optimized AI systems.

Jun 16, 2026 1 source
Scribby Multi-Level LLM Framework Promises Fine-Grained Semantic Analysis of Long-Form Video Technology
Artificial Intelligence #llm#video analysis

Scribby Multi-Level LLM Framework Promises Fine-Grained Semantic Analysis of Long-Form Video

Researchers propose Scribby, an LLM-based framework for semantic video analysis that balances macro-level comprehension with micro-level semantic indexing. The approach analyzes full transcripts, individual sentences, and groups sentences by semantic similarity using an LLM as a judge, enabling more detailed understanding of video structure and thematic progression.

Jun 16, 2026 1 source
MAGE-RAG: Multigranular Adaptive Graph Evidence Framework Improves Long-Document Multimodal QA Accuracy Technology
Artificial Intelligence #mage-rag#multimodal

MAGE-RAG: Multigranular Adaptive Graph Evidence Framework Improves Long-Document Multimodal QA Accuracy

The MAGE-RAG research paper introduces a multigranular adaptive graph evidence framework for multimodal retrieval-augmented generation (RAG) in long-document question answering. By building an evidence graph with page and element nodes and using an online controller to iteratively activate and prune evidence, it balances coverage and noise. Experiments show accuracy improvements over existing methods on LongDocURL and MMLongBench-Doc benchmarks.

Jun 16, 2026 1 source
Training-Free Framework Uses XAI and Multimodal LLMs to Generate Grounded Explanations for Speech Deepfake Detection Technology
Artificial Intelligence #deepfake detection#xai

Training-Free Framework Uses XAI and Multimodal LLMs to Generate Grounded Explanations for Speech Deepfake Detection

Researchers propose a training-free explanation framework that integrates XAI evidence with multimodal large language models to generate grounded and specific explanations for speech deepfake detection. Using the PartialSpoof dataset, the method increases inside accuracy by over 45%, verified through human evaluation and faithfulness checks.

Jun 16, 2026 1 source
Unifying Acoustic Features and Text with Multimodal LLMs for Neurodegenerative Disease Staging Technology
Artificial Intelligence #multimodal#llms

Unifying Acoustic Features and Text with Multimodal LLMs for Neurodegenerative Disease Staging

Researchers propose NeurMLLM, a multimodal generative framework that integrates acoustic features and text using a large language model for neurodegenerative disease staging. Evaluated on the Bridge2AI-Voice dataset, it outperforms classical machine learning and existing LLM-based methods for Alzheimer's and Parkinson's disease staging.

Jun 16, 2026 1 source
X-Tokenizer: Semantic Action Tokenizer Boosts Robot Control by 13.5% Over FAST Technology
Artificial Intelligence #x-tokenizer#multimodal

X-Tokenizer: Semantic Action Tokenizer Boosts Robot Control by 13.5% Over FAST

Researchers propose X-Tokenizer, a new action tokenizer that treats tokenization as semantic interface learning rather than mere compression. Using a lightweight encoder-Semantic Residual Quantization (SRQ)-decoder architecture, it improves multimodal grounding by 13.5% and long-horizon task performance by 8.25 points over existing methods like FAST.

Jun 16, 2026 1 source
Visual-Seeker: Visual-Native AI Agent for Active Visual Reasoning in Multimodal Search Technology
Artificial Intelligence #visual reasoning#multimodal

Visual-Seeker: Visual-Native AI Agent for Active Visual Reasoning in Multimodal Search

Researchers propose Visual-Seeker, a visual-native multimodal deep search agent that actively harvests fine-grained visual evidence during search. Using a synthesized dataset of 5K multimodal trajectories, it achieves state-of-the-art on five benchmarks, outperforming several proprietary models.

Jun 16, 2026 1 source