iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
MSC denies report of Hapag-Lloyd acquisition talks; carrier says claim 'not true or correct' Tin Prices Poised to Rule Elevated in 2026 on Semiconductor Demand and Supply Disruptions India must boost oilseed yields to cut edible oil imports, SEA chief says India Air Freights 5 Tonnes of Medical Aid to Afghanistan Under Humanitarian Assistance Tsakos Joins Greek Capesize Ordering Wave at Hengli Heavy Industries How US quietly kept Gulf crude moving despite Iran's Hormuz blockade Rupee Rebounds 31 Paise to 94.29 as Easing Oil, Dollar Index Boost Sentiment Shipping Braces for Monster El Niño as NOAA Warns of Record-Intensity Event Threatening Global Trade Lanes India May Require Refiners to Triple Crude Oil Inventories After Lessons From China Fleets Reposition for Hormuz Reopening Ahead of US-Iran Peace Deal Signing MSC denies report of Hapag-Lloyd acquisition talks; carrier says claim 'not true or correct' Tin Prices Poised to Rule Elevated in 2026 on Semiconductor Demand and Supply Disruptions India must boost oilseed yields to cut edible oil imports, SEA chief says India Air Freights 5 Tonnes of Medical Aid to Afghanistan Under Humanitarian Assistance Tsakos Joins Greek Capesize Ordering Wave at Hengli Heavy Industries How US quietly kept Gulf crude moving despite Iran's Hormuz blockade Rupee Rebounds 31 Paise to 94.29 as Easing Oil, Dollar Index Boost Sentiment Shipping Braces for Monster El Niño as NOAA Warns of Record-Intensity Event Threatening Global Trade Lanes India May Require Refiners to Triple Crude Oil Inventories After Lessons From China Fleets Reposition for Hormuz Reopening Ahead of US-Iran Peace Deal Signing
Home ›› Topics ›› Computer Vision

Topic

Computer Vision

60 stories
Study Finds Hybrid CNN-Clay Model Improves Landslide Detection Accuracy Over Baseline Technology
Artificial Intelligence #landslide detection#geospatial

Study Finds Hybrid CNN-Clay Model Improves Landslide Detection Accuracy Over Baseline

A study evaluates Clay v1.5, a Geospatial Foundation Model, for pixel-level landslide segmentation on the Landslide4Sense benchmark. The hybrid U-Net + Clay model with two-stage LoRA achieves a test F1 of 64.5%, outperforming both the Clay-only backbone and a standard U-Net baseline.

Jun 17, 2026 1 source
CrossMaps: Real-Time Open-Vocabulary Semantic Mapping for Autonomous Rover Navigation Technology
Artificial Intelligence #robotics#semantic mapping

CrossMaps: Real-Time Open-Vocabulary Semantic Mapping for Autonomous Rover Navigation

A new research paper presents CrossMaps, a real-time confidence-aware open-vocabulary semantic mapping pipeline that constructs language-queryable maps from RGB-D data for rover navigation. It integrates multi-scale CLIP embeddings with confidence-aware fusion and a dual-memory architecture, running on a Jetson Orin-powered UGV alongside SLAM.

Jun 17, 2026 1 source
Region-Adaptive Sampling Cuts Diffusion Transformer Inference Time by Up to 2.5x With Negligible Quality Loss Technology
Artificial Intelligence #region-adaptive sampling#diffusion transformers

Region-Adaptive Sampling Cuts Diffusion Transformer Inference Time by Up to 2.5x With Negligible Quality Loss

Researchers introduce RAS, a training-free sampling method for Diffusion Transformers that selectively updates only the regions of focus at each step, caching others. Achieves up to 2.51x speedup on Lumina-Next-T2I and 2.36x on Stable Diffusion 3 with minimal quality drop, as reported in a new arxiv paper. A user study found comparable quality at 1.6x speedup.

Jun 17, 2026 1 source
Input-Dependent Fisher Information Enables Local Sensitivity Analysis of Medical Image Classifiers Technology
Artificial Intelligence #input-dependent fisher information#local sensitivity analysis

Input-Dependent Fisher Information Enables Local Sensitivity Analysis of Medical Image Classifiers

A research paper introduces a local sensitivity analysis framework based on the input-dependent Fisher Information Matrix (iFIM) for medical image classifiers. The method projects input images into high- and low-sensitivity components, showing that high-sensitivity components are more strongly tied to predictive confidence and classification performance. This provides a principled tool for interpreting black-box deep neural networks in medical imaging.

Jun 17, 2026 1 source
M*: A Modular, Extensible Serving System for Efficient Multimodal AI Inference Technology
Artificial Intelligence #multimodal#ai serving

M*: A Modular, Extensible Serving System for Efficient Multimodal AI Inference

Researchers have developed M*, a universal serving system for composite AI models that integrates diverse components like vision encoders and language backbones. Using a novel 'Walk Graph' abstraction, M* achieves significant performance improvements: 20% lower latency for text-to-image, up to 2.7x higher throughput for text-to-speech, and 12.5x faster robotic planning rollouts compared to existing baselines.

Jun 16, 2026 1 source
New Benchmark and Method Address Occlusion in Vision-Language-Action Models for Robotics Technology
Artificial Intelligence #vision-language-action#occlusion

New Benchmark and Method Address Occlusion in Vision-Language-Action Models for Robotics

Researchers introduced LIBERO-Occ, an occlusion-oriented benchmark for Vision-Language-Action (VLA) models, and proposed Viewpoint Imagination (VIM), a method that generates a complementary view from an occluded primary observation to condition action prediction. Experiments show that state-of-the-art VLAs suffer substantial performance degradation under occlusion, and VIM improves robustness across task suites, occlusion types, and severity levels without requiring additional cameras at deployment.

Jun 16, 2026 1 source
Wasserstein Equilibrium Decoding Boosts Reliability in Medical Visual Question Answering Technology
Artificial Intelligence #wasserstein equilibrium#decoding

Wasserstein Equilibrium Decoding Boosts Reliability in Medical Visual Question Answering

Researchers have extended game-theoretic decoding to vision-language models for medical visual question answering, introducing a Wasserstein stopping criterion that improves accuracy by up to 3.5 percentage points and reduces inference iterations by 20% while maintaining reliability.

Jun 16, 2026 1 source
BRITE Benchmark Reveals Critical Gaps in Text-to-Video Models' Object-Action Binding and Audio-Visual Sync Technology
Artificial Intelligence #benchmark#text-to-video

BRITE Benchmark Reveals Critical Gaps in Text-to-Video Models' Object-Action Binding and Audio-Visual Sync

A new benchmark called BRITE provides the first unified framework for evaluating text-to-video (T2V) models on implausible prompts, audio-visual consistency, and interpretable QA-based assessment. Testing five state-of-the-art models including Sora 2 and Veo 3.1, BRITE reveals that while models excel at static object composition, they show significant degradation in object-action binding and audio-visual synchronization.

Jun 16, 2026 1 source
Language-Guided AI Framework CLARITY Boosts Road Scene Segmentation for Autonomous Logistics Technology
Artificial Intelligence #autonomous driving#computer vision

Language-Guided AI Framework CLARITY Boosts Road Scene Segmentation for Autonomous Logistics

Researchers propose CLARITY, a language-guided framework for RGB-Thermal semantic segmentation that dynamically adapts fusion strategies based on scene illumination. On the MFNet dataset, it achieves 62.3% mIoU and 77.5% mAcc, setting a new state-of-the-art for robust road scene understanding in autonomous driving, critical for logistics automation.

Jun 16, 2026 1 source
Biological Vision Inspired Framework Improves Machine Perception of Illusory Contours for AI Systems Technology
Artificial Intelligence #computer vision#machine perception

Biological Vision Inspired Framework Improves Machine Perception of Illusory Contours for AI Systems

A team of researchers has developed a novel deep network called ICPNet, inspired by the visual cortex, that significantly improves machine perception of abutting grating illusory contours. The approach addresses a key limitation of current deep neural networks, achieving notable gains in top-1 accuracy on new test sets.

Jun 16, 2026 1 source
AnchorEdit: Autoregressive Diffusion Tackles Identity Drift in Multi-Turn Image Editing Technology
Artificial Intelligence #artificial intelligence#computer vision

AnchorEdit: Autoregressive Diffusion Tackles Identity Drift in Multi-Turn Image Editing

Researchers propose AnchorEdit, the first autoregressive diffusion-based framework for multi-turn image editing, addressing identity drift and error accumulation via a three-stage training curriculum and a causal memory mechanism. The method achieves state-of-the-art subject fidelity and instruction following over extended editing trajectories.

Jun 16, 2026 1 source
3D Skeleton Person Re-Identification Survey Reveals Taxonomy, Advances, and Interdisciplinary Potential Technology
Artificial Intelligence #3d skeleton#person re-identification

3D Skeleton Person Re-Identification Survey Reveals Taxonomy, Advances, and Interdisciplinary Potential

A new survey on 3D skeleton based person re-identification (SRID) provides a comprehensive taxonomy, covering hand-crafted, sequence-based, and graph-based modeling approaches, along with supervised, self-supervised, and unsupervised learning paradigms. The paper reviews state-of-the-art methods, evaluates them on standard benchmarks, and discusses key challenges and interdisciplinary prospects, with potential applications in security, biometrics, and beyond.

Jun 16, 2026 1 source
SpatialWorld Benchmark Reveals Multimodal Agents Struggle with Interactive Spatial Reasoning Technology
Artificial Intelligence #spatial reasoning#multimodal agents

SpatialWorld Benchmark Reveals Multimodal Agents Struggle with Interactive Spatial Reasoning

Researchers introduced SpatialWorld, a benchmark for evaluating interactive spatial understanding of multimodal agents in real-world tasks. Testing 15 advanced agents, the strongest model (GPT-5) achieved only 17.4% task success rate, highlighting challenges in active exploration and long-horizon planning.

Jun 16, 2026 1 source
Snap Launches $2,195 AR Glasses 'Specs' for Consumer Market, Available for Preorder Technology
Hardware #snap#ar

Snap Launches $2,195 AR Glasses 'Specs' for Consumer Market, Available for Preorder

Snap has unveiled its first consumer augmented-reality glasses called Specs at the AWE tech conference. Priced at $2,195 with a $220 deposit, the glasses offer a 51-degree field of view, dual Qualcomm Snapdragon processors, and hand-tracking cameras. Preorders are open now for shipping in fall 2026 in the US, UK, and France.

Jun 16, 2026 1 source
Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition Technology
Artificial Intelligence #mand#modality-aware

Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition

A new research paper introduces MAND, a modality-aware framework for multimodal egocentric open-world continual learning. MAND addresses limitations of existing methods that underutilize IMU cues and suffer from catastrophic forgetting, leading to improved novelty detection and known-class accuracy on a public benchmark.

Jun 16, 2026 1 source
Phase, Not Magnitude, Drives Image Classifier Predictions, New Research Reveals Technology
Artificial Intelligence #neural representations#phase

Phase, Not Magnitude, Drives Image Classifier Predictions, New Research Reveals

A new study by Yıldırım tests whether image classifiers reproduce the Oppenheim-Lim phase dominance inside their hidden layers. By transplanting phase from one image to magnitude of another, the research finds that in architectures like ViT-B/16 and GFNet, predictions follow the phase donor, and removing image-specific magnitude barely affects accuracy. ResNet-50 exhibits a latent sign code before ReLU activation.

Jun 16, 2026 1 source
MapDream: Task-Driven Map Learning Achieves State-of-the-Art Vision-Language Navigation Technology
Artificial Intelligence #mapdream#task-driven

MapDream: Task-Driven Map Learning Achieves State-of-the-Art Vision-Language Navigation

Researchers propose MapDream, a framework that learns bird's-eye-view maps directly from navigation objectives rather than hand-crafted reconstruction. The approach achieves state-of-the-art monocular performance on the R2R-CE and RxR-CE benchmarks.

Jun 16, 2026 1 source
DySink: Dynamic Frame Sinks Enable Adaptive Long Video Generation Without Context Collapse Technology
Artificial Intelligence #video generation#autoregressive

DySink: Dynamic Frame Sinks Enable Adaptive Long Video Generation Without Context Collapse

Researchers propose DySink, a retrieval-based framework that replaces static early-frame sinks with dynamic, visually relevant historical frames for autoregressive long video generation. This approach prevents sink collapse and improves temporal quality in minute-long videos.

Jun 16, 2026 1 source
SceneConductor Generates 3D Scenes from Single Images Using Multi-Agent Orchestration Technology
Artificial Intelligence #3d scene generation#multi-agent orchestration

SceneConductor Generates 3D Scenes from Single Images Using Multi-Agent Orchestration

Researchers propose SceneConductor, a multi-agent orchestration framework that decomposes single-image 3D scene generation into three structured stages: initialization, environment construction, and refinement. It also introduces a geometry-aware layout predictor to reduce reliance on scene-level annotations. Experiments show it consistently outperforms prior approaches in geometric accuracy, spatial consistency, and perceptual realism.

Jun 16, 2026 1 source
Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Livestock Monitoring Technology
Artificial Intelligence #lightweight distillation#sam

Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Livestock Monitoring

Researchers distilled SAM 3's 446M-parameter backbone into a 40.66M-parameter student, achieving 92.29% MOTA and 96.15% IDF1 on the Edinburgh Pig dataset. The pipeline runs on an NVIDIA Jetson Orin NX 16GB with 4.9GB headroom, enabling on-device individual-level livestock monitoring and longitudinal visual analytics.

Jun 16, 2026 1 source
Uncertainty Quality of VGGT: Analysis on DTU Benchmark Dataset Reveals Effective Confidence Threshold for 3D Reconstruction Technology
Artificial Intelligence #vggt#dtu benchmark

Uncertainty Quality of VGGT: Analysis on DTU Benchmark Dataset Reveals Effective Confidence Threshold for 3D Reconstruction

A new paper investigates the uncertainty predictions of the Visual Geometry Grounded Transformer (VGGT), which won Best Paper at CVPR-2025. The analysis on the DTU benchmark dataset identifies an effective confidence threshold for filtering VGGT's raw output and shows potential for improving 3D reconstruction accuracy.

Jun 16, 2026 1 source
Learned Image Compression Framework SPARC Boosts VLA Robot Control Performance in Bandwidth-Limited Settings Technology
Artificial Intelligence #learned image compression#vision-language-action models

Learned Image Compression Framework SPARC Boosts VLA Robot Control Performance in Bandwidth-Limited Settings

Researchers introduce SPARC (SPatially Adaptive Rate Control), a learned image compression framework tailored for vision-language-action (VLA) models. SPARC adaptively allocates bitrate based on task relevance and uses a tilted rate loss to preserve critical visual patterns. Experiments on robotic benchmarks RoboCasa365, VLABench, and LIBERO show SPARC achieves stronger control performance than conventional codecs at the same bitrate, with real-world benefits for remote robot control.

Jun 16, 2026 1 source
K-Prism Model Unifies Medical Image Segmentation with Knowledge-Guided Prompt Integration Technology
Artificial Intelligence #medical image segmentation#knowledge-guided

K-Prism Model Unifies Medical Image Segmentation with Knowledge-Guided Prompt Integration

Researchers present K-Prism, a unified segmentation framework that integrates three knowledge paradigms—semantic priors, in-context examples, and interactive feedback—via a dual-prompt representation and Mixture-of-Experts decoder. Tested on 18 public datasets spanning multiple modalities, K-Prism achieves state-of-the-art performance across semantic, in-context, and interactive segmentation tasks.

Jun 16, 2026 1 source
Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs Technology
Artificial Intelligence #cascaded sparse autoencoders#multimodal llms

Cascaded Sparse Autoencoders Enable Hierarchical Visual Concept Learning in Multimodal LLMs

Researchers introduce cascaded sparse autoencoders (CSAEs) that learn hierarchical visual concepts in multimodal large language models. By training a second-level SAE on the decoder weights of the first, CSAEs achieve 'concepts of concepts' without nesting or stacking bottlenecks. Experiments on Qwen3-VL, Gemma-3, and LLaVA show improved interpretability and effective group-level steering.

Jun 16, 2026 1 source
VinQA Dataset Enables Multimodal Document QA with Interleaved Visual Elements for Enterprise AI Technology
Artificial Intelligence #vinqa#multimodal

VinQA Dataset Enables Multimodal Document QA with Interleaved Visual Elements for Enterprise AI

A new dataset called VinQA targets long-form answer generation in multimodal document QA, where cited visual elements are interleaved with text. The paper compares two encoding methods and an evaluation framework, showing that fine-tuning open Qwen2.5-VL models can approach proprietary frontier model performance.

Jun 16, 2026 1 source
ControlMap: Controllable HD Map Generation Using Latent Diffusion for Traffic Simulation Technology
Artificial Intelligence #hd-map#traffic simulation

ControlMap: Controllable HD Map Generation Using Latent Diffusion for Traffic Simulation

Current autonomous driving simulation is limited by costly HD map creation. ControlMap presents a pipeline using latent diffusion and ControlNet to generate HD maps that follow specific road topologies and city styles. The model introduces novel metrics for adherence and similarity.

Jun 16, 2026 1 source
Akasha 2 Achieves 4x Faster Visual Synthesis with Hamiltonian-Inspired AI Architecture Technology
Artificial Intelligence #artificial intelligence#machine learning

Akasha 2 Achieves 4x Faster Visual Synthesis with Hamiltonian-Inspired AI Architecture

Akasha 2 introduces Hamiltonian State Space Duality and Visual-Language Joint Embedding Predictive Architecture, achieving state-of-the-art video prediction with 4x faster synthesis than diffusion models and 3-18x speedup over transformers. The system enforces physical conservation laws for spatiotemporal coherence.

Jun 16, 2026 1 source
PURe Module Enhances Vision Networks by Adding Multiplicative Local Interactions Technology
Artificial Intelligence #plug-and-play#product-unit

PURe Module Enhances Vision Networks by Adding Multiplicative Local Interactions

Researchers propose PURe, a Product-Unit Residual Module that introduces explicit multiplicative local interactions into deep vision networks. The module serves as a drop-in replacement for native residual units, consistently improving performance on benchmarks like ImageNet and CIFAR-10 while using smaller parameter budgets.

Jun 16, 2026 1 source
SACE Framework Introduces First Scale-Aware Concept Erasure for Visual Autoregressive Models to Prevent Catastrophic Semantic Collapse Technology
Artificial Intelligence #artificial intelligence#machine learning

SACE Framework Introduces First Scale-Aware Concept Erasure for Visual Autoregressive Models to Prevent Catastrophic Semantic Collapse

Researchers propose SACE, the first scale-aware concept erasure framework for visual autoregressive (VAR) models. It prevents catastrophic semantic collapse caused by naive application of erasure techniques from diffusion models. The framework introduces the Semantic Singularity Axiom and Incremental Semantic Saliency Analysis to surgically erase concepts with minimal overhead.

Jun 16, 2026 1 source
AIRMap AI Framework Generates Radio Maps 100x Faster Than Ray Tracing for Wireless Digital Twins Technology
Artificial Intelligence #ai#artificial intelligence

AIRMap AI Framework Generates Radio Maps 100x Faster Than Ray Tracing for Wireless Digital Twins

Researchers propose AIRMap, a deep-learning framework that generates radio maps from a 2D elevation map in 4 ms, over 100x faster than GPU-accelerated ray tracing. Trained on 1.2M Boston-area samples, it predicts path gain with under 4 dB RMSE. Integration into Colosseum and Sionna SYS shows near-zero error in spectral efficiency compared to measurement-based channels.

Jun 16, 2026 1 source
ActiveSAM Speeds Open-Vocabulary Segmentation 5.5x, Boosts Accuracy for Noisy-Input Domains Technology
Artificial Intelligence #research#segmentation

ActiveSAM Speeds Open-Vocabulary Segmentation 5.5x, Boosts Accuracy for Noisy-Input Domains

ActiveSAM is a training-free inference framework that improves the speed-accuracy tradeoff of open-vocabulary semantic segmentation. It achieves up to 5.5x faster inference on large-vocabulary datasets while boosting average mIoU by 1.4 points over the state-of-the-art SegEarth-OV3. The method is robust to image corruption, making it suitable for noisy real-world deployments like autonomous driving.

Jun 16, 2026 1 source
Semantic Flip: Synthetic OOD Generation for Robust Refusal in Embodied Question Answering and Spatial Localization Technology
Artificial Intelligence #synthetic ood generation#robust refusal

Semantic Flip: Synthetic OOD Generation for Robust Refusal in Embodied Question Answering and Spatial Localization

The Semantic Flip framework trains a lightweight rejection module on top of frozen vision-language models to detect unanswerable queries in embodied question answering and spatial localization. It synthesizes out-of-distribution pairs by transforming query and video memory, achieving high refusal accuracy without external OOD annotations.

Jun 16, 2026 1 source
Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection Technology
Artificial Intelligence #federated learning#medical image segmentation

Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection

Federated learning enables collaborative medical image segmentation without centralizing sensitive data, but real-world label noise hampers deployment. A new benchmark suite combines diverse real-world noisy datasets, client-noise scenarios, and targeted evaluation to support systematic assessment of federated noisy label learning methods, addressing the gap left by synthetic noise studies.

Jun 16, 2026 1 source
Lightweight Hardware-Aware Neural Architecture Search Enables CNNs on Ultra-Low-Power Microcontrollers Technology
Artificial Intelligence #neural architecture search#hardware-aware

Lightweight Hardware-Aware Neural Architecture Search Enables CNNs on Ultra-Low-Power Microcontrollers

A new hardware-aware neural architecture search (HW-NAS) method generates tiny convolutional neural networks (CNNs) suitable for ultra-low-power microcontrollers, using a lightweight search procedure that can execute on embedded devices. Empirical results on three tiny computer vision benchmarks show it preserves state-of-the-art classification accuracy, addressing the power limitations of sensing nodes.

Jun 16, 2026 1 source
Multi-Sensor Fusion Technique Enhances UAV Classification Accuracy Using Image and Radar Data Technology
Artificial Intelligence #artificial intelligence#computer vision

Multi-Sensor Fusion Technique Enhances UAV Classification Accuracy Using Image and Radar Data

Researchers proposed a multi-sensor fusion methodology that combines thermal, optronic, and radar data using a deep neural network to classify UAVs. The CNN-based architecture stacks image features from different sensors to achieve higher classification accuracy than any single sensor alone.

Jun 16, 2026 1 source
RealityBridge: New AI Framework Edits 3D Driving Simulations to Close the Sim-to-Real Gap Technology
Artificial Intelligence #realitybridge#3d gaussian splatting

RealityBridge: New AI Framework Edits 3D Driving Simulations to Close the Sim-to-Real Gap

RealityBridge is a structure-preserving framework that edits 3D Gaussian Splatting driving simulations and bridges the gap to real-world video quality. It uses multimodal controls and autoregressive training to reduce artifacts, harmonize illumination, and ensure temporal consistency, outperforming existing methods on driving datasets.

Jun 16, 2026 1 source
FusionRS Dataset Advances Dual-Modal Vision-Language AI for Remote Sensing Technology
Artificial Intelligence #fusionrs#remote sensing

FusionRS Dataset Advances Dual-Modal Vision-Language AI for Remote Sensing

Researchers introduced FusionRS, the first large-scale RGB-infrared-text dataset for dual-modal vision-language learning in remote sensing. The dataset pairs RGB and infrared images with scene and IR-aware captions, enabling models to achieve better alignment and retrieval than RGB-only approaches.

Jun 16, 2026 1 source
New Multi-Scale Two-Stream Framework Aims to Decouple Semantics from Distortions in AI-Generated Image Quality Assessment Technology
Artificial Intelligence #ai-generated image#quality assessment

New Multi-Scale Two-Stream Framework Aims to Decouple Semantics from Distortions in AI-Generated Image Quality Assessment

Researchers introduce MST-CLIPIQA, a multi-scale two-stream vision-language framework that decouples semantic understanding from distortion detection to improve AI-generated image quality assessment. The method uses dual CLIP encoders and an information bottleneck gated fusion mechanism, achieving state-of-the-art results on five benchmarks with only 0.8 million trainable parameters.

Jun 16, 2026 1 source
EgoPhys Framework Creates Deformable Object Digital Twins from Single Egocentric Video Technology
Artificial Intelligence #ai#computer vision

EgoPhys Framework Creates Deformable Object Digital Twins from Single Egocentric Video

Researchers present EgoPhys, a framework that creates deformable physical digital twins from egocentric RGB video using generalizable priors. Deployed on an xArm6 robot, it enables zero-shot generalization and future prediction for elastic materials and fabrics, offering a scalable path to real-to-sim pipelines.

Jun 16, 2026 1 source
Ensemble Deep Learning Achieves 99.27% Accuracy in Lemon Leaf Disease Detection Technology
Artificial Intelligence #deep learning#ensemble

Ensemble Deep Learning Achieves 99.27% Accuracy in Lemon Leaf Disease Detection

A study on arXiv presents an ensemble deep learning approach for classifying lemon leaf diseases, achieving 99.27% accuracy. The method combines InceptionV3 and MobileNetV2 with adversarial training and Grad-CAM visualization, using a dataset of 1,354 images across 9 classes.

Jun 16, 2026 1 source
XMedFusion: A Knowledge-Guided Multimodal Perception and Reasoning Framework for Autonomous Medical Systems Technology
Artificial Intelligence #artificial intelligence#medical

XMedFusion: A Knowledge-Guided Multimodal Perception and Reasoning Framework for Autonomous Medical Systems

Researchers introduce XMedFusion, a knowledge-guided multimodal perception and reasoning framework for autonomous medical systems. The framework decomposes visual information into coordinated agents, achieving significant improvements in radiology report generation metrics on a public chest radiograph dataset.

Jun 16, 2026 1 source
Gen-VCoT: New Framework Generates RGB Images as Visual Chain-of-Thought Intermediates for Multimodal AI Reasoning Technology
Artificial Intelligence #generative ai#visual reasoning

Gen-VCoT: New Framework Generates RGB Images as Visual Chain-of-Thought Intermediates for Multimodal AI Reasoning

Researchers propose Gen-VCoT, a framework that generates RGB images as visual chain-of-thought intermediates, improving spatial reasoning by 25% and depth reasoning by 50% over baseline MLLMs, though text-based CoT remains superior for simple factual queries.

Jun 16, 2026 1 source
UniBrain: A Unified Multimodal Model for Brain MRI Imputation and Understanding Technology
Artificial Intelligence #unified multimodal model#brain mri

UniBrain: A Unified Multimodal Model for Brain MRI Imputation and Understanding

Researchers propose UniBrain, a unified multimodal large language model for brain MRI analysis that handles missing data through joint imputation and understanding. The model uses interleaved data flow, self-alignment, and dynamic hidden state mechanisms to achieve high performance on multi-disease MRI datasets.

Jun 16, 2026 1 source
JoyAI-VL-Interaction Model Brings Real-Time Vision-Language AI to Enterprise Applications Technology
Artificial Intelligence #ai#artificial intelligence

JoyAI-VL-Interaction Model Brings Real-Time Vision-Language AI to Enterprise Applications

JoyAI-VL-Interaction is an open-source, 8B-scale vision-language model that continuously monitors video streams and decides in real time whether to stay silent, speak, or delegate to a background model. Human raters preferred it over Doubao and Gemini in six real-world scenarios. The system includes pluggable ASR/TTS, memory, and API integration.

Jun 16, 2026 1 source
Sub-Quadratic Vision Transformers Cut Self-Attention Cost for Faster Image Captioning Technology
Artificial Intelligence #vision transformers#image captioning

Sub-Quadratic Vision Transformers Cut Self-Attention Cost for Faster Image Captioning

A new arXiv preprint from Ghosh et al. proposes a sub-quadratic vision transformer architecture for image captioning. By replacing standard self-attention with a Gaussian Mixture Model (GMM) clustering mechanism, the model reduces computational complexity from quadratic O(n²) to linear O(nK). The approach uses an autoregressive GPT-based decoder and achieves competitive results on the Flickr30K dataset.

Jun 16, 2026 1 source
RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load Technology
Artificial Intelligence #edge computing#ai

RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load

Researchers present RAMS, a runtime controller that monitors device pressure and dynamically selects among three YOLOv8 tiers on embedded hardware, achieving up to 5.6x faster inference than a fixed medium model while retaining 74% of its accuracy. The system introduces a detection-conditioned switching policy and a new scalar metric, SWAS, for offline policy comparison.

Jun 16, 2026 1 source
Mutual Distillation of Dual Foundation Models Achieves State-of-the-Art PET/CT Segmentation with Only 5 Labeled Cases Technology
Artificial Intelligence #medical imaging#segmentation

Mutual Distillation of Dual Foundation Models Achieves State-of-the-Art PET/CT Segmentation with Only 5 Labeled Cases

Researchers propose MuDuo, a mutual distillation framework that leverages two foundation models (SAM-Med3D for CT, SegAnyPET for PET) to distill knowledge into a lightweight student network for semi-supervised PET/CT segmentation. Achieving state-of-the-art performance on the AutoPET dataset with only 5 labeled cases, the approach eliminates manual prompts and maximizes unlabeled data utility.

Jun 16, 2026 1 source
Medical Image Segmentation Survey: U-Net, Transformers, SAM and Clinical Translation Challenges Technology
Artificial Intelligence #medical imaging#image segmentation

Medical Image Segmentation Survey: U-Net, Transformers, SAM and Clinical Translation Challenges

A new arXiv survey systematically reviews medical image segmentation methods based on U-Net, Transformer, and SAM architectures. It covers public datasets, evaluation metrics, and key challenges, aiming to guide future research and clinical adoption. The authors have made all related resources publicly available on GitHub.

Jun 16, 2026 1 source
Deep Learning Enables Autonomous Logistics Vehicles to Detect and Pick Load Carriers Technology
Artificial Intelligence #deep learning#object recognition

Deep Learning Enables Autonomous Logistics Vehicles to Detect and Pick Load Carriers

A research paper presents a deep learning-based framework that uses a convolutional neural network on RGBD images to identify landmarks on load carriers and compute their pose. Experiments show sufficient accuracy for reliable detection in industrial environments, supporting autonomous intralogistics operations.

Jun 16, 2026 1 source
New Automated Quantization Framework AQ4SViT Compresses Spiking Vision Transformers for Embedded AI Technology
Artificial Intelligence #ai#quantization

New Automated Quantization Framework AQ4SViT Compresses Spiking Vision Transformers for Embedded AI

Researchers propose AQ4SViT, an automated quantization framework for Spiking Vision Transformers that uses a search gating policy to find optimal compression settings. It offers two variants: Greedy search for speed and Beam search for deeper compression. Experimental results on ImageNet show up to 6.6x faster search time and up to 90% memory savings while maintaining accuracy within 1.5% of the original model.

Jun 16, 2026 1 source
LUCID AI Framework Enhances Sparse-View CT Reconstruction with Flow Matching and Consistency Guidance Technology
Artificial Intelligence #ct reconstruction#sparse-view

LUCID AI Framework Enhances Sparse-View CT Reconstruction with Flow Matching and Consistency Guidance

Researchers propose LUCID, a sparsity-adaptive consistency-guided framework for sparse-view CT reconstruction that uses flow matching to generate high-quality images from undersampled data. The method reduces radiation dose and scanning time while improving image quality and structural fidelity.

Jun 16, 2026 1 source
Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing Technology
Artificial Intelligence #zero-watermarking#ai editing

Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing

Rel-Zero is a novel zero-watermarking framework that leverages the invariance of relational distances between image patch pairs during AI editing. It derives a unique watermark from intrinsic structural consistency, offering non-invasive content authentication with improved robustness over prior approaches.

Jun 16, 2026 1 source
GEASS: Gated Evidence-Adaptive Selective Caption Trust Tackles VLM Hallucination Technology
Artificial Intelligence #geass#gated evidence-adaptive

GEASS: Gated Evidence-Adaptive Selective Caption Trust Tackles VLM Hallucination

Vision-language models often hallucinate objects, and feeding them their own captions can actually worsen accuracy. Researchers propose GEASS, a gated evidence-adaptive module that decides per query how much of the caption to trust, improving accuracy across four VLMs on two benchmarks without training or additional parameters.

Jun 16, 2026 1 source
NEXUS: Neural Energy Fields Improve Physics Consistency in 3D Object Dynamics Simulations Technology
Artificial Intelligence #neural energy fields#physics simulation

NEXUS: Neural Energy Fields Improve Physics Consistency in 3D Object Dynamics Simulations

NEXUS is a neural energy-field framework for contact-rich 3D object dynamics, representing objects as structural graphs and formulating motion through scalar energy and dissipation terms. It improves long-horizon accuracy over existing baselines and provides effective guidance for physically plausible video generation.

Jun 16, 2026 1 source
Divide-and-Denoise: Game-Theoretic Method Ensures Fair Composition of Diffusion Models Technology
Artificial Intelligence #game-theory#diffusion-models

Divide-and-Denoise: Game-Theoretic Method Ensures Fair Composition of Diffusion Models

Researchers propose Divide-and-Denoise, a game-theoretic method for composing multiple pre-trained diffusion models fairly. At each timestep, an allocation divides the noisy sample into regions, maximizing utility under fairness constraints. The method outperforms baselines on the GenEval benchmark, resolving common failures like missing objects and mismatched attributes.

Jun 16, 2026 1 source
Domain-Guided Prompting Boosts Segment Anything Model for Seismic Interpretation Technology
Artificial Intelligence #segment anything model#seismic interpretation

Domain-Guided Prompting Boosts Segment Anything Model for Seismic Interpretation

Researchers introduce a domain-guided prompting framework for the Segment Anything Model (SAM) that enables zero-shot seismic interpretation without retraining. By aligning seismic attributes and colormaps with geological targets and using a hybrid of point and mask prompts, the approach improves segmentation accuracy and boundary delineation. This reduces reliance on labeled data and computational cost.

Jun 16, 2026 1 source
Multi-Modal Attention Model Achieves 94.9% Accuracy in Automated Disaster Damage Classification Using Satellite Imagery Technology
Artificial Intelligence #deep learning#remote sensing

Multi-Modal Attention Model Achieves 94.9% Accuracy in Automated Disaster Damage Classification Using Satellite Imagery

Researchers have developed a novel deep learning framework that automates building damage classification from satellite imagery. The model uses a multi-modal attention mechanism to fuse pre- and post-disaster images, categorizing damage into four levels with 94.90% accuracy, significantly improving assessment speed and aiding emergency responders.

Jun 16, 2026 1 source
Teacher-Student Domain Adaptation Boosts Ensemble Audio-Visual Deepfake Detection by Up to 18% Technology
Artificial Intelligence #deepfake detection#teacher-student

Teacher-Student Domain Adaptation Boosts Ensemble Audio-Visual Deepfake Detection by Up to 18%

Researchers propose EAV-DFD, an ensemble audio-visual deepfake detection model with a teacher-student domain adaptation mechanism. Tested on FakeAVCeleb as primary domain and three unseen datasets (DFDC, Deepfake_TIMIT, PolyGlotFake), it improved AUC by 4.09%, 17.94%, and 0.5%, respectively, using only a small portion of target domain data.

Jun 16, 2026 1 source
Sensor-Conditioned Representation Learning Uses Scene-Relevant Observation Quotients to Improve Latent Geometry Technology
Artificial Intelligence #sensor-conditioned#representation learning

Sensor-Conditioned Representation Learning Uses Scene-Relevant Observation Quotients to Improve Latent Geometry

Researchers propose a sensor-conditioned representation learning framework using scene-relevant observation quotients. Their OQ-TSAE method, tested on synthetic and real-radar data, improves representation-correctness diagnostics over reconstruction, metric-learning, and contrastive baselines.

Jun 16, 2026 2 sources
OmniTraffic Pipeline Enables Controlled Training of Spatio-Temporal Traffic AI for Logistics Technology
Artificial Intelligence #omnitraffic#controllable generation

OmniTraffic Pipeline Enables Controlled Training of Spatio-Temporal Traffic AI for Logistics

Researchers introduce OmniTraffic, a controllable generation pipeline and benchmark for spatio-temporal traffic reasoning. Built on 12 real-world intersections and surveillance footage from two countries, it generates 8M VQA samples and a 3K human-verified test set. Evaluation of 11 frontier MLLMs shows a large human-model gap, especially in topology-grounded reasoning. Fine-tuning on OmniTraffic data improves real-world performance, offering a valuable tool for logistics and supply chain AI.

Jun 16, 2026 1 source