Computer Vision

Hardware #security cameras#subscription-free

Subscription-Free Security Cameras Offer Local AI and Privacy for Enterprise Surveillance

Subscription-free security cameras are closing the gap with cloud-based systems, offering local AI detection and storage. WIRED's Simon Hill tests top models from Eufy, Synology, and IC Realtime, highlighting cost savings and data privacy for enterprise use.

Jul 26, 2026 1 source

Hyderabad Researchers Develop AI-Powered Plant Leaf Disease Detection System with 96% Accuracy

Artificial Intelligence #ai#plant disease

Hyderabad Researchers Develop AI-Powered Plant Leaf Disease Detection System with 96% Accuracy

A team led by Vijaya Saraswathi at VNR Vignana Jyothi Institute of Engineering and Technology in Hyderabad has patented an AI-powered leaf disease detection system that uses a convolutional neural network trained on over 20,000 images to identify diseases in tomato, potato, and pepper crops with 96% accuracy. The system also recommends pesticides and is planned for mobile app deployment.

Jul 21, 2026 1 source

Meta's NameTag Face Recognition: Code Exists, Feature 'Doesn't' – What Enterprise Buyers Must Know

Artificial Intelligence #meta#nametag

Meta's NameTag Face Recognition: Code Exists, Feature 'Doesn't' – What Enterprise Buyers Must Know

WIRED reports that Meta's NameTag face recognition code was embedded in the Meta AI app since January 2025, but Meta executives claim the feature does not exist. CTO Andrew Bosworth described it in detail on a podcast, while VP Andy Stone stated it doesn't exist. The code was removed after WIRED's story. The incident highlights challenges for enterprise buyers in verifying AI product claims.

Jul 15, 2026 1 source

New York Governor Signs First Statewide Data Center Moratorium, Halting Hyperscale Development for One Year

Cloud Computing #new york#data center

New York Governor Signs First Statewide Data Center Moratorium, Halting Hyperscale Development for One Year

New York Governor Kathy Hochul signed an executive order enacting a one-year moratorium on hyperscale data centers over 50 megawatts, the first statewide pause in the US. The order directs the Department of Public Service to evaluate environmental and energy impacts and proposes ending tax incentives. The move follows growing opposition and a legislative bill with stricter limits.

Jul 14, 2026 1 source

EU Parliament Votes to Extend Big Tech's Right to Scan Private Messages Despite Majority Opposition

Cybersecurity #european lawmakers#big tech

EU Parliament Votes to Extend Big Tech's Right to Scan Private Messages Despite Majority Opposition

The European Parliament has voted to extend legislation allowing tech companies like Meta, Google, and Microsoft to voluntarily scan users' private messages for child sexual abuse material, despite a majority of lawmakers voting against the proposal. The ruling reinstates permissions for scanning private text, email, and social media messages, but end-to-end encrypted chats remain exempt.

Jul 9, 2026 1 source

Bi-Anchor Interpolation Solver Cuts Generative Modeling Steps from 100 to 10, Researchers Show

Artificial Intelligence #generative modeling#artificial intelligence

Bi-Anchor Interpolation Solver Cuts Generative Modeling Steps from 100 to 10, Researchers Show

Researchers introduce the Bi-Anchor Interpolation Solver (BA-solver) for accelerating flow matching generative models. It achieves quality comparable to 100+ step solvers in just 10 steps, using a small SideNet (1-2% of backbone size) and novel bidirectional temporal perception. The method is plug-and-play with existing pipelines.

SARLO-80: New Dataset Combines Very-High-Resolution SAR and Optical Imagery with Language Descriptions

Artificial Intelligence #sar#dataset

SARLO-80: New Dataset Combines Very-High-Resolution SAR and Optical Imagery with Language Descriptions

Researchers have released SARLO-80, a large-scale dataset combining very-high-resolution synthetic aperture radar (SAR) imagery, aligned optical imagery, and natural-language descriptions. Built from Umbra spotlight acquisitions, the dataset contains 119,566 triplets across 72 countries, standardized to 80cm slant-range resolution. It aims to advance multimodal foundation models for SAR by providing complex-valued measurements and native acquisition geometry.

REVEAL++: Continuous Phenotypic Grouping Improves Vision-Language Retinal Model for Alzheimer's Risk

Artificial Intelligence #alzheimer's#retinal imaging

REVEAL++: Continuous Phenotypic Grouping Improves Vision-Language Retinal Model for Alzheimer's Risk

Researchers propose REVEAL++, a vision-language model that models phenotypic similarity as a continuous signal rather than discrete clusters, improving Alzheimer's disease risk prediction from retinal fundus images. Evaluated on UK Biobank data, it outperforms prior baselines by using a soft-target contrastive objective.

New Research Reveals How Visual Tokens Evolve Inside Vision-Language Models

Artificial Intelligence #vision language models#computer vision

New Research Reveals How Visual Tokens Evolve Inside Vision-Language Models

A new computer vision paper from arXiv investigates how visual tokens are integrated into large language models (LLMs) under two paradigms: in-context prompting and layer-wise injection. The authors find that visual tokens enter the LLM as 'disguised visual context' lacking linguistic structure, then evolve differently depending on the integration architecture. They show that attention allocation alone is insufficient, and performance depends on the quality of visual representations at each layer.

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

Artificial Intelligence #flow matching#object dynamics

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

FlowMaps, a latent flow matching model, predicts multimodal distributions of future object locations in 3D space by learning from past human interactions. Tested in over 600 episodes, it outperforms state-of-the-art approaches for dynamic Object Navigation tasks in simulated and real environments. The research, published on arXiv, has potential applications for robotics in changing environments.

New Framework for Class-Incremental Motion Forecasting Enables Autonomous Vehicles to Adapt to Novel Objects

Artificial Intelligence #class-incremental#motion forecasting

New Framework for Class-Incremental Motion Forecasting Enables Autonomous Vehicles to Adapt to Novel Objects

Researchers introduce class-incremental motion forecasting, a setting where autonomous vehicles learn new object classes over time. They propose the first end-to-end framework that adapts to novel classes while mitigating catastrophic forgetting, using pseudo-labels and open-vocabulary segmentation. Evaluations on nuScenes and Argoverse 2 show preserved performance on known classes and effective adaptation to new ones.

Mitigating Simplicity Bias in OOD Detection through Object Co-occurrence Analysis

Artificial Intelligence #ood detection#simplicity bias

Mitigating Simplicity Bias in OOD Detection through Object Co-occurrence Analysis

Researchers propose an object-centric OOD detection framework that leverages object co-occurrence patterns to overcome simplicity bias, achieving competitive results on near-OOD and full-spectrum settings.

New Framework GeoVR Learns 3D Spatial Intelligence from 2D Videos for Multimodal LLMs

Artificial Intelligence #artificial intelligence#multimodal

New Framework GeoVR Learns 3D Spatial Intelligence from 2D Videos for Multimodal LLMs

Multimodal Large Language Models (MLLMs) traditionally lack intrinsic 3D awareness. Researchers present GeoVR, a framework that learns geometric representations from 2D video sequences, restructuring the semantic latent space to unlock spatial intelligence. GeoVR uses four complementary geometric targets from pre-trained 3D foundation models, achieving state-of-the-art performance on spatial reasoning benchmarks.

VCG: Multimodal Retrieval Framework Solves Extreme Cold-Start Problem for E-Commerce Video Feeds

Artificial Intelligence #video retrieval#multimodal

VCG: Multimodal Retrieval Framework Solves Extreme Cold-Start Problem for E-Commerce Video Feeds

E-commerce platforms are shifting to video feeds but face extreme cold-start problems because new videos lack interaction history. The VCG system, described in a recent arXiv paper, uses a CLIP-based multimodal retrieval engine to map users and videos into a shared semantic space, enabling zero-shot retrieval. Online A/B testing showed a 50% uplift in deep video completion, demonstrating effective mitigation of engagement biases.

TeleMorpher: New AI Framework Edits Video Motion and Location Simultaneously

Artificial Intelligence #telemorpher#motion-editing

TeleMorpher: New AI Framework Edits Video Motion and Location Simultaneously

Researchers have developed TeleMorpher, a one-shot framework for simultaneous motion and location editing in video. The approach leverages motion priors, segmentation, and training-free pose warping to achieve robust edits while preserving appearance. Experiments show superior performance on in-the-wild videos and the TaiChi dataset.

Property title search engine Landeed to invest ₹30cr in FY27 for business growth

Startups #property#title search

Property title search engine Landeed to invest ₹30cr in FY27 for business growth

Property title search engine Landeed will invest ₹30 crore in FY27 to expand its business. The company, which raised around ₹100 crore from investors including Y Combinator, will focus on GPU infrastructure, small language models, and Indic OCR to convert fragmented property records into structured intelligence.

Jul 7, 2026 1 source

How AI is Empowering Indian Farmers with Smarter Supply Chains and Market Access

Artificial Intelligence #ai#artificial intelligence

How AI is Empowering Indian Farmers with Smarter Supply Chains and Market Access

Indian agriculture is leveraging AI to address rising costs, pest spread, and market volatility. From the Government's Kisan e-Mitra chatbot to AI-driven crop diagnosis and precision farming, these tools help farmers make better decisions and improve supply chain efficiency. Enterprise applications include demand prediction, logistics optimization, and credit risk assessment.

Jul 5, 2026 1 source

EU Politician Investigating Pegasus Spyware Was Hacked With the Same Malware, Citizen Lab Finds

Cybersecurity #spyware#pegasus

EU Politician Investigating Pegasus Spyware Was Hacked With the Same Malware, Citizen Lab Finds

A new analysis by Citizen Lab reveals that Greek MEP Stelios Kouloglou, a member of the European Parliament's PEGA Committee investigating Pegasus spyware, had his iPhone hacked multiple times with the same spyware. The incident marks the first time a committee member has been identified as a victim and highlights the brazen targeting of European lawmakers. Researchers could not identify the attacker but warn of severe security implications for parliamentary work.

Jul 3, 2026 1 source

Samsara Ride Along pushes fleet safety AI beyond incident flagging to continuous driver monitoring

Artificial Intelligence #samsara#fleet safety

Samsara Ride Along pushes fleet safety AI beyond incident flagging to continuous driver monitoring

Samsara introduced Ride Along at its Beyond 2026 conference, an AI feature that conducts virtual ride-alongs of 10-30 minutes to produce a full behavioral picture of drivers, shifting from rare incident flagging to continuous positive reinforcement. UNFI was an early beta tester, and the company also announced an in-cab conversational AI agent.

Jun 26, 2026 1 source

How Flipkart Uses Generative AI to Shift E-Commerce from Search to Intent-Led Commerce

Artificial Intelligence #ai#artificial intelligence

How Flipkart Uses Generative AI to Shift E-Commerce from Search to Intent-Led Commerce

At the ETRetail E-Commerce and Digital Natives Summit 2026, Flipkart executives outlined how generative AI and agentic systems are transforming e-commerce from search-driven transactions to intent-led commerce. The shift enables conversational discovery, AI-powered seller tools, and agent-based workflows across the value chain.

Jun 24, 2026 1 source

Meta Launches $299 Smart Glasses With Own Branding, Drop Ray-Ban Name

Hardware #meta#smart glasses

Meta Launches $299 Smart Glasses With Own Branding, Drop Ray-Ban Name

Meta announced three new smart glasses models—Adventurer, Fury, and Starfire—priced at $299, dropping Ray-Ban branding in favor of Meta's own name. The glasses offer adjustable comfort, customizable frames, and 8-hour battery life. A Kylie Jenner collaboration brings custom AI voice and design. Meta competes with Snap's recently launched AR Specs.

Jun 23, 2026 1 source

Federal Workers Cannot Delete White House App Forced Onto Government Phones

Software #app#white house

Federal Workers Cannot Delete White House App Forced Onto Government Phones

The White House's new mobile app has been automatically downloaded onto work phones of millions of federal employees, who say they cannot delete it and that it reappears after removal. Cybersecurity experts flagged data-sharing issues, including initial sharing of location and IP addresses with third parties, and widgets from a Russia-based company that exposed officials' personal information.

Jun 23, 2026 1 source

Controlled Benchmark Finds No Quantum Advantage in Brain MRI Data Augmentation

Artificial Intelligence #quantum-latent gan#gan

Controlled Benchmark Finds No Quantum Advantage in Brain MRI Data Augmentation

A controlled benchmark study by Haider and Figini shows that quantum-latent GAN augmentation does not improve brain MRI classification over real-data-only training or classical GANs. The quantum and classical generators were statistically indistinguishable across all data fractions from 5% to 100%.

LLM Paraphrase Augmentation Boosts Sign Language Translation Performance

Artificial Intelligence #sign language#translation

LLM Paraphrase Augmentation Boosts Sign Language Translation Performance

A new study proposes using a large language model (GPT-4o) to generate controlled paraphrase variants of training targets for sign language translation (SLT). Evaluated on three datasets, the method yields a modest BLEU-4 improvement on PHOENIX14T and reveals gains in semantic fidelity not captured by lexical metrics.

CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation

Artificial Intelligence #ai#cad

CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation

CADBench is a unified benchmark for multimodal CAD program generation, containing 18,000 evaluation samples across six benchmark families, five input modalities, and six metrics. The benchmark evaluates eleven AI systems, generating over 1.4 million CAD programs, and reveals key failure modes in current approaches.

Breast MRI AI Challenge Reveals Trade-Offs Between Accuracy and Fairness Across Patient Subgroups

Artificial Intelligence #breast mri#tumor segmentation

Breast MRI AI Challenge Reveals Trade-Offs Between Accuracy and Fairness Across Patient Subgroups

The MAMA-MIA Challenge provided a standardized benchmark for breast MRI tumor segmentation and pathologic complete response prediction. Using a training cohort of 1,506 patients from US institutions and an external test set of 574 patients from three European centers, 26 international teams showed substantial performance variability and trade-offs between overall accuracy and subgroup fairness across age, menopausal status, and breast density.

FreeStyle: Scalable Style-Content Dual-Reference Generation via Community LoRA Mining

Artificial Intelligence #generative ai#lora

FreeStyle: Scalable Style-Content Dual-Reference Generation via Community LoRA Mining

FreeStyle is a scalable dual-reference generation framework that leverages community LoRAs as compositional anchors for style and content. It introduces a two-stage curriculum with attention-level enrichment and frequency-aware RoPE modulation to suppress leakage from style references. The framework is evaluated on a new benchmark covering style similarity, content preservation, and leakage rejection, achieving a strong balance among these objectives.

New AI Research Shows Vision-Language Models Think Better with Visual Grounding

Artificial Intelligence #vlm#vision-language model

New AI Research Shows Vision-Language Models Think Better with Visual Grounding

Researchers introduce visually grounded thinking, a reasoning process that interleaves natural-language thoughts with explicit point or box groundings to image regions. The method, using a scalable synthesis pipeline and grounding-aware reinforcement learning, consistently improves performance of Gemma3-4B-IT on counting and spatial reasoning benchmarks, with the 4B model matching or surpassing the 27B variant.

Jun 21, 2026 2 sources

HilDA: Hierarchical Distillation with Diffusion Advances Self-Supervised LiDAR Pre-Training

Artificial Intelligence #lidar#self-supervised

HilDA: Hierarchical Distillation with Diffusion Advances Self-Supervised LiDAR Pre-Training

Researchers propose HilDA, a self-supervised pretraining framework for LiDAR backbones that uses hierarchical distillation and temporal occupancy diffusion. The method achieves state-of-the-art results on cross-modal distillation benchmarks for 3D object detection, scene flow, and semantic occupancy prediction.

MakeupMirror Model Boosts Facial Attribute Preservation in Diffusion-Based Makeup Transfer

Artificial Intelligence #makeup transfer#diffusion models

MakeupMirror Model Boosts Facial Attribute Preservation in Diffusion-Based Makeup Transfer

Researchers propose MakeupMirror, a diffusion-based makeup transfer model that preserves facial identity and skin tone better than previous solutions. It achieves 60% higher facial recognition similarity, 50% lower skin tone difference, and 0.7s latency, with 94% expert acceptance, advancing virtual try-on for e-commerce.

DF3DV-1K: Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Artificial Intelligence #dataset#benchmark

DF3DV-1K: Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Researchers introduced DF3DV-1K, a large-scale real-world dataset with 1,048 scenes and 89,924 images for distractor-free novel view synthesis. The dataset spans 128 distractor types and 161 scene themes, enabling benchmarking of nine radiance field methods and 3D Gaussian Splatting. Fine-tuning a diffusion-based 2D enhancer on DF3DV-1K achieved average improvements of 0.96 dB PSNR and 0.057 LPIPS.

Tri-Info Method Predicts VLA Model Failures with 83% Accuracy Across Real-World Tasks, Researchers Report

Artificial Intelligence #tri-info#failure prediction

Tri-Info Method Predicts VLA Model Failures with 83% Accuracy Across Real-World Tasks, Researchers Report

Researchers propose Tri-Info, a method using information theory to detect failures in Vision-Language-Action (VLA) models. It matches top baselines in-domain and achieves 83% accuracy on real-world tasks, with interpretable diagnostics.

Latent Gaussian Splatting Achieves State-of-the-Art in 4D Panoptic Occupancy Tracking for Robots

Artificial Intelligence #latent#gaussian splatting

Latent Gaussian Splatting Achieves State-of-the-Art in 4D Panoptic Occupancy Tracking for Robots

Researchers introduce Latent Gaussian Splatting (LaGS) for 4D Panoptic Occupancy Tracking, a method that models 3D features as sparse Gaussians for continuous spatiotemporal scene understanding. It achieves state-of-the-art results on Occ3D nuScenes and Waymo datasets, addressing limitations of bounding box tracking and static occupancy estimation.

Unsupervised Algorithms Cut Annotation Time by 78% for Industrial Semantic Segmentation

Artificial Intelligence #semantic segmentation#annotation

Unsupervised Algorithms Cut Annotation Time by 78% for Industrial Semantic Segmentation

Researchers have demonstrated that unsupervised computer vision algorithms can reduce the annotation time for semantic segmentation tasks in industrial materials science by 78%, from 170 hours to 37 hours. The team created the largest public steel microstructure segmentation dataset and a benchmark deep learning model, validated by field experts and deployed in an industrial setting.

Vero: An Open RL Recipe for General Visual Reasoning — A Fully Open Vision-Language Model Family

Artificial Intelligence #ver0#open rl recipe

Vero: An Open RL Recipe for General Visual Reasoning — A Fully Open Vision-Language Model Family

A new research paper introduces Vero, a family of fully open vision-language models (VLMs) that use reinforcement learning (RL) to achieve strong general visual reasoning. The team constructed a 600K-sample dataset from 59 datasets and designed task-routed rewards. Vero variants outperformed their base models by 2.9-5.4 points on average across a 30-benchmark suite, and the best variant surpassed a stronger closed model by 3.8 points. All code, data, and models are released publicly.

TerraMind: First Any-to-Any Generative Multimodal Foundation Model for Earth Observation

Artificial Intelligence #earth observation#satellite imagery

TerraMind: First Any-to-Any Generative Multimodal Foundation Model for Earth Observation

Researchers have introduced TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Pretrained on dual-scale representations across nine geospatial modalities, it achieves beyond state-of-the-art performance on the PANGAEA benchmark and introduces a novel 'Thinking-in-Modalities' capability.

AI Method Overcomes Labelled Data Scarcity for Defect Classification in STM

Artificial Intelligence #defect classification#scanning tunneling microscopy

AI Method Overcomes Labelled Data Scarcity for Defect Classification in STM

Scanning tunneling microscopy image analysis traditionally requires extensive manual labeling. A new approach combines few-shot learning and unsupervised learning to automate defect classification, achieving high accuracy on multiple surfaces with minimal labelled data. The model can adapt to unseen surfaces with as few as one additional data point.

STAR Allocation Method Improves Text-to-Image AI Training with Spatiotemporal Rewards

Artificial Intelligence #artificial intelligence#text-to-image

STAR Allocation Method Improves Text-to-Image AI Training with Spatiotemporal Rewards

A new method called SpatioTemporal Adaptive Reward (STAR) Allocation improves reinforcement learning post-training for text-to-image generation. By using text-image attention to allocate rewards to relevant latent regions, STAR enhances compositional semantic alignment, text rendering, and preference optimization without changing the external reward source. The method was validated on Stable Diffusion 3.5 Medium, achieving top scores on GenEval, OCR, and PickScore benchmarks.

New Tokenization Method Merges Tokens to Improve Diffusion Transformer Efficiency

Artificial Intelligence #variable-length tokenization#learnable global merging

New Tokenization Method Merges Tokens to Improve Diffusion Transformer Efficiency

A research paper introduces a variable-length tokenizer that merges tokens instead of truncating them, enabling adaptive compression for diffusion transformers. The method, called learnable global merging, addresses representational alignment issues across token lengths and achieves a superior trade-off between image quality (gFID) and computational cost.

CSWinUNETR: Deep Learning Model Segments Thin Anatomical Structures with Cross-Shaped Self-Attention

Artificial Intelligence #medical image segmentation#deep learning

CSWinUNETR: Deep Learning Model Segments Thin Anatomical Structures with Cross-Shaped Self-Attention

Researchers propose CSWinUNETR, a deep learning backbone for 2D and 3D segmentation of thin anatomical structures such as retinal vessels, cerebral vasculature, and facial wrinkles. The model employs cross-shaped stripe self-attention, cyclic shifts, and sparse-control dynamic snake convolution to improve segmentation accuracy. It outperforms state-of-the-art methods on four benchmarks without task-specific post-processing.

See-and-Reach: Researchers Propose 3DG-VLN for Precise UAV Vision-Language Navigation Within Field of View

Artificial Intelligence #see-and-reach#vision-language

See-and-Reach: Researchers Propose 3DG-VLN for Precise UAV Vision-Language Navigation Within Field of View

Researchers introduce UAV-VLN-FOV, a target-visible navigation task that isolates the see-and-reach stage for UAVs, and propose 3DG-VLN, a vision-language waypoint prediction framework that uses dynamic 3D direction cues. The framework achieves a 13.82% improvement in success rate over baselines on a new benchmark of 2,717 trajectories.

Interpretable Sperm Morphology Classification via Attention-Guided Deep Learning

Artificial Intelligence #deep learning#sperm morphology

Interpretable Sperm Morphology Classification via Attention-Guided Deep Learning

A study proposes an interpretable deep learning framework combining EfficientNet-B0 with a Convolutional Block Attention Module for sperm morphology classification, achieving 90.2% and 93.9% accuracy on SMIDS and HuSHem datasets respectively.

Spatial-Aware Reduction Framework Boosts Efficiency and Accuracy of Visual State Space Models

Artificial Intelligence #spatial-aware#reduction framework

Spatial-Aware Reduction Framework Boosts Efficiency and Accuracy of Visual State Space Models

Researchers propose STORM, a spatial-aware token reduction framework for visual state space models. It maintains structural integrity during compression, achieving state-of-the-art pruning accuracy. On VMamba, STORM recovers up to 63.3% of top-1 accuracy, with only a 1.0% drop on PlainMamba.

PrototypeNAS: Zero-Shot Neural Architecture Search for Microcontroller-Based Edge AI

Artificial Intelligence #prototypenas#neural architecture search

PrototypeNAS: Zero-Shot Neural Architecture Search for Microcontroller-Based Edge AI

PrototypeNAS is a zero-shot neural architecture search method that rapidly designs deep neural networks for microcontroller units. It uses a novel three-step search process to compress and specialize DNNs without training from scratch, achieving accuracy comparable to large models in minutes.

ParaScale: A Gauge-Invariant Approach to Scale-Calibrated Camera-Motion Transfer

Artificial Intelligence #parascale#camera-motion

ParaScale: A Gauge-Invariant Approach to Scale-Calibrated Camera-Motion Transfer

Researchers present ParaScale, a plug-and-play module that calibrates camera-motion transfer between videos of vastly different scales using a gauge-invariant parallax number. It reduces parallax consistency error by more than 3x over uncalibrated methods.

PerceptionDLM: Multimodal Diffusion Model Achieves Parallel Region Perception

Artificial Intelligence #artificial intelligence#computer vision

PerceptionDLM: Multimodal Diffusion Model Achieves Parallel Region Perception

Researchers propose PerceptionDLM, a multimodal diffusion language model optimized for parallel region perception. Built on the state-of-the-art baseline PerceptionDLM-Base, it uses efficient prompting and structured attention masking to generate descriptions for multiple masked regions simultaneously, significantly improving inference efficiency. The team also introduces the ParaDLC-Bench benchmark to evaluate parallelism in visual perception.

New Multi-Agent AI Pipeline Delivers Auditable Financial Chart QA with On-Premise Deployment

Artificial Intelligence #agentfinvqa#multi-agent

New Multi-Agent AI Pipeline Delivers Auditable Financial Chart QA with On-Premise Deployment

A new multi-agent AI pipeline called AgentFinVQA enables auditable financial chart question answering with on-premise deployment, improving accuracy by up to 7.68 percentage points over baselines and providing a verifier-based confidence signal for human-in-the-loop review. The system records every step in a traceable Model Evaluation Packet.

Triangular Consistency Constraint Offers Universal Plug-and-Play Component for Optical Flow Learning

Artificial Intelligence #optical flow#computer vision

Triangular Consistency Constraint Offers Universal Plug-and-Play Component for Optical Flow Learning

Researchers propose triangular consistency, a first-principled constraint for optical flow that is agnostic to network architecture, supervision type, and dataset. The constraint composes two flows to induce a third and enforces consistency, showing consistent improvement across supervised, unsupervised, and transfer learning with negligible computational overhead.

BrainG3N Tokenizer Enables Controllable 3D Brain MRI Generation with Clinical-Grade Embeddings

Artificial Intelligence #ai#artificial intelligence

BrainG3N Tokenizer Enables Controllable 3D Brain MRI Generation with Clinical-Grade Embeddings

BrainG3N, a novel tokenizer for 3D brain MRI latent diffusion, decouples encoder and decoder to preserve clinical information while enabling high-quality reconstruction. Pretrained on 35,309 volumes, it outperforms SOTA models on 21 of 23 clinical tasks and supports controllable generation for disease simulation and privacy-preserving data sharing.

First Billion-Parameter Generative Foundation Model for Chest Radiography Achieves Expert-Level Synthesis Fidelity

Artificial Intelligence #generative models#foundation models

First Billion-Parameter Generative Foundation Model for Chest Radiography Achieves Expert-Level Synthesis Fidelity

Ribeiro et al. present the largest specialist generative foundation model for chest radiographs, with over 1.3 billion parameters. Trained on 1.2 million radiographs, the model supports controllable generation across demographics, views, and pathologies, advancing synthesis fidelity to clinical indistinguishability.

New Benchmark Reveals Remote Sensing AI Models Fail at Negation Comprehension

Artificial Intelligence #ai#multimodal

New Benchmark Reveals Remote Sensing AI Models Fail at Negation Comprehension

A new study introduces RS-Neg, the first benchmark to evaluate negation comprehension in remote sensing multimodal large language models. The evaluation reveals that advanced models exhibit hallucinations and performance degradation when handling negation. The proposed NeFo method, using about 5% unlabeled test samples, significantly improves negation understanding, with implications for critical applications like emergency response and logistics.

New AI Framework PSCT-Net Reduces Radiation Risk in Pediatric Skull CT Imaging

Artificial Intelligence #pediatric#skull

New AI Framework PSCT-Net Reduces Radiation Risk in Pediatric Skull CT Imaging

PSCT-Net is a novel deep learning framework for reconstructing 3D CT scans of pediatric skulls from only two X-ray images, significantly reducing radiation exposure. The method uses differentiable back-projection and attention-guided refinement to overcome depth ambiguity. It was evaluated on a private dataset called PedSkull-CT.

QueryGaussian: Training-Free 3D Instance Retrieval Cuts GPU Memory by 70%, Speeds Inference 180x

Artificial Intelligence #technology#artificial intelligence

QueryGaussian: Training-Free 3D Instance Retrieval Cuts GPU Memory by 70%, Speeds Inference 180x

QueryGaussian, a new training-free framework for open-vocabulary 3D instance retrieval, reduces GPU memory usage by more than 70% and accelerates inference by 180x compared to existing methods, enabling city-scale scenes on consumer-grade hardware.

New Temporal Pyramid Model Enhances Spoofed Speech Detection for Voice Security Systems

Artificial Intelligence #spoofed speech#speech detection

New Temporal Pyramid Model Enhances Spoofed Speech Detection for Voice Security Systems

Researchers introduced a Temporal Pyramid Adapter for spoofed speech detection that uses parallel temporal convolutions with varying receptive fields to capture multi-scale cues. The model achieved a 99.24% AUC and 3.87% EER on the PartialSpoof dataset, significantly outperforming existing methods like LCNN-BLSTM (9.87% EER) and TRACE (8.08% EER). The work highlights the potential for improving voice authentication security but notes performance degradation under domain and language shifts.

New AI Framework Synthesizes Fluorescein Angiography from Fundus and Sparse OCT Scans

Artificial Intelligence #medical imaging#fluorescein angiography

New AI Framework Synthesizes Fluorescein Angiography from Fundus and Sparse OCT Scans

A research team led by Ma introduced a novel deep learning framework that synthesizes fluorescein angiography (FFA) from color fundus photography (CFP) using structural guidance from sparse optical coherence tomography (OCT) scans. The method uses a tri-modally aligned dataset of 3,676 patient eyes and achieves superior synthesis and downstream diagnosis performance compared to existing methods.

SLUM-i: AI Semi-Supervised Learning Maps Informal Settlements with Benchmark Dataset

Artificial Intelligence #semi-supervised learning#urban mapping

SLUM-i: AI Semi-Supervised Learning Maps Informal Settlements with Benchmark Dataset

A new AI framework called SLUM-i uses semi-supervised learning to map informal settlements in cities like Lahore, Karachi, and Mumbai. It introduces a benchmark dataset and achieves up to +5.9 pp mIoU improvement over existing methods.

New AI Research Analyzes When Score-Based Models Outperform Traditional Channel Estimation

Artificial Intelligence #score-based generative models#channel estimation

New AI Research Analyzes When Score-Based Models Outperform Traditional Channel Estimation

A new paper from Skocaj, Eller, and Boban provides a theoretically grounded analysis of score-based generative models for channel estimation in wireless communications. The study uses the perception-distortion tradeoff to reveal when score-matching offers advantages over traditional discriminative learning, with numerical results showing benefits under high predictive uncertainty but recommending simpler approaches otherwise.

SelectStream: A Selective Memory Framework for Streaming Video Understanding

Artificial Intelligence #artificial intelligence#streaming video

SelectStream: A Selective Memory Framework for Streaming Video Understanding

Researchers propose SelectStream, a selective latent-memory framework for streaming video models that uses surprise-driven adaptive windowing and query-conditioned graph reasoning to allocate memory efficiently. It achieves 82.67% on StreamingBench and 74.4% on offline benchmarks.

New Robotic Architecture AVP Improves Pick-and-Place Success Rate by 37% over Existing Models

Artificial Intelligence #action#visual

New Robotic Architecture AVP Improves Pick-and-Place Success Rate by 37% over Existing Models

A new research paper introduces AVP (Action with Visual Primitives), an end-to-end architecture for robotic manipulation that decouples visual-language reasoning from action generation. In real-robot pick-and-place experiments, AVP achieved a 37.04% higher success rate than the pi_0.5 baseline, with gains in data efficiency, spatial-compositional generalization, and object-level transfer.