Home ›› Topics ›› ai research

Topic

ai research

37 stories

Artificial Intelligence #reinforcement learning#theorem proving

Process-Verified Reinforcement Learning for Theorem Proving via Lean: A New Path to AI Reliability

A new arXiv preprint presents process-verified reinforcement learning for theorem proving, using the Lean proof assistant as a symbolic process oracle. By parsing proof attempts into tactic sequences and leveraging Lean's type-theoretic feedback, the method delivers dense, verifier-grounded credit signals. Experiments with STP-Lean and DeepSeek-Prover-V1.5 show tactic-level supervision outperforms outcome-only baselines on MiniF2F and ProofNet benchmarks.

Jul 8, 2026 2 sources

Yann LeCun's new AI startup AMI Labs raises $1bn to build flexible intelligence beyond LLMs

Technology

Artificial Intelligence #artificial intelligence#yann lecun

Yann LeCun's new AI startup AMI Labs raises $1bn to build flexible intelligence beyond LLMs

Yann LeCun, former Meta chief AI scientist, has founded AMI Labs to develop a new AI architecture called JEPA, which aims to overcome the limitations of large language models (LLMs) in understanding the physical world. The startup raised over $1bn in seed funding from Nvidia and Jeff Bezos' private investment fund, marking one of Europe's largest seed rounds.

Jul 2, 2026 1 source

ScaleWoB Framework Synthesizes Realistic Environments to Evaluate GUI Agents at Scale

Technology

Artificial Intelligence #gui agents#coding agents

ScaleWoB Framework Synthesizes Realistic Environments to Evaluate GUI Agents at Scale

ScaleWoB is a new framework that generates high-fidelity synthesized interactive environments for evaluating GUI agents across mobile, desktop, and automotive platforms. It includes 100+ environments and 1000+ verifiable tasks. Experiments on five state-of-the-art mobile GUI agents show an average success rate of only 27.92%, compared to 92.08% for humans, highlighting substantial room for improvement.

Jun 22, 2026 1 source

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

Technology

Artificial Intelligence #artificial intelligence#large reasoning models

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

A new research paper from arXiv shows that reinforcement learning with verifiable rewards (RLVR) can cause large reasoning models to forget foundational capabilities like perception and faithfulness. The authors propose RECAP, a replay strategy with dynamic objective reweighting that preserves general knowledge while maintaining reasoning gains.

Jun 21, 2026 1 source

FreeStyle: Scalable Style-Content Dual-Reference Generation via Community LoRA Mining

Technology

Artificial Intelligence #generative ai#lora

FreeStyle: Scalable Style-Content Dual-Reference Generation via Community LoRA Mining

FreeStyle is a scalable dual-reference generation framework that leverages community LoRAs as compositional anchors for style and content. It introduces a two-stage curriculum with attention-level enrichment and frequency-aware RoPE modulation to suppress leakage from style references. The framework is evaluated on a new benchmark covering style similarity, content preservation, and leakage rejection, achieving a strong balance among these objectives.

Jun 21, 2026 1 source

SL-S4Wave: Self-Supervised Learning Framework Improves ECG and EEG Analysis with State Space Models

Technology

Artificial Intelligence #self-supervised learning#physiological waveforms

SL-S4Wave: Self-Supervised Learning Framework Improves ECG and EEG Analysis with State Space Models

Researchers propose SL-S4Wave, a self-supervised learning framework combining contrastive learning with structured state space models (S4) to analyze long-sequence physiological waveforms. The model outperforms state-of-the-art baselines in arrhythmia detection and EEG tasks, demonstrates strong label efficiency, and generalizes to unseen arrhythmia types.

Jun 21, 2026 1 source

Transformer Feed-Forward Block Linearity: Learned, Not Architectural, According to New Research

Technology

Artificial Intelligence #transformer#feed-forward

Transformer Feed-Forward Block Linearity: Learned, Not Architectural, According to New Research

A new study introduces R^2_lin, a measure of linearity for transformer feed-forward blocks. Across models like GPT-2 and Pythia-160m, R^2_lin varies widely and is not determined by activation function. The findings offer targeted compression signals and reveal pitfalls in training linear baselines.

Jun 20, 2026 1 source

New Framework MACR Resolves Knowledge Conflicts in LLMs Using Multi-Agent Reasoning

Technology

Artificial Intelligence #llms#knowledge conflict

New Framework MACR Resolves Knowledge Conflicts in LLMs Using Multi-Agent Reasoning

A research paper proposes MACR, a novel framework for resolving knowledge conflicts in large language models (LLMs). Unlike existing approaches that privilege either internal parametric knowledge or external context, MACR uses an adaptive knowledge assessment and a multi-agent reasoning system to explicitly identify and resolve inconsistencies. Empirical results show MACR significantly outperforms state-of-the-art benchmarks while providing interpretable conflict resolutions.

Jun 20, 2026 1 source

Argent Signaling Protocol Mitigates Semantic Drift in Multi-Agent AI Systems

Technology

Artificial Intelligence #multi-agent systems#semantic drift

Argent Signaling Protocol Mitigates Semantic Drift in Multi-Agent AI Systems

Researchers introduce the Argent Signaling Protocol (ASP), a machine-readable header that tags AI responses with certainty, grounding, stochasticity, and assumption indices. In tests on document-grounded QA, ASP improved pass rates from 11.1% to 33.3% on a small model and blocked 100% of ungrounded outputs in multi-agent mode.

Jun 20, 2026 1 source

LLM-Based A/B Testing Needs Calibration: New Statistical Framework Reveals 39% Accuracy Gap

Technology

Artificial Intelligence #llm#a/b testing

LLM-Based A/B Testing Needs Calibration: New Statistical Framework Reveals 39% Accuracy Gap

A new paper from researchers at arXiv develops a statistical framework for using large language models (LLMs) as surrogates for human participants in A/B tests. The framework adapts surrogate endpoint theory, showing that raw LLM predictions recover only 39% of the human treatment effect, but calibration can close the gap. The study cautions that LLM-based A/B testing yields correct results only by assumption, whereas human testing is correct by design.

Jun 20, 2026 1 source

New Method Improves Confidence Calibration for Medical Multimodal LLMs by 40%

Technology

Artificial Intelligence #large language models#multimodal

New Method Improves Confidence Calibration for Medical Multimodal LLMs by 40%

A new study presents the first comprehensive analysis of confidence calibration in medical multimodal large language models (MLLMs). The proposed method, combining Multi-Strategy Fusion-Based Interrogation (MS-FBI) with auxiliary expert LLM assessment, reduces Expected Calibration Error by an average of 40% across three Medical Visual Question Answering datasets, improving reliability for AI-assisted diagnosis.

Jun 20, 2026 1 source

How Do Instructions Shape Speech? New Cross-Attribution Method Reveals Style Control in TTS

Technology

Artificial Intelligence #text-to-speech#style-captioned

How Do Instructions Shape Speech? New Cross-Attribution Method Reveals Style Control in TTS

A research paper introduces cross-attention attribution for style-captioned text-to-speech, adapting the DAAM framework to speech diffusion models. The method extracts per-token heatmaps across layers and steps, analyzing 3,600 combinations to reveal how caption tokens influence waveforms. Key findings include lower temporal variance for style tokens, correlation with F0 and energy, and peak style conditioning in early ODE steps and deep layers.

Jun 20, 2026 2 sources

Unified Causal-Origin Taxonomy for Distributional Shifts in Reinforcement Learning Systems

Technology

Artificial Intelligence #reinforcement learning#distributional shifts

Unified Causal-Origin Taxonomy for Distributional Shifts in Reinforcement Learning Systems

A research paper on arXiv presents a unified causal-origin taxonomy for distributional shifts in reinforcement learning (RL). Using a Partially Observable Markov Decision Process (POMDP), the taxonomy categorizes shifts as internal (agent-driven) or external (environment-driven), and as explicit, implicit, or hybrid based on a shifted-time boundary. An evaluation framework measures performance degradation and recovery. This work provides a systematic foundation for analyzing robustness in RL systems under changing conditions.

Jun 17, 2026 2 sources

Agent Rosetta: How an LLM Agent Masters Protein Design for Specialized Scientific Tasks

Technology

Artificial Intelligence #protein design#agent rosetta

Agent Rosetta: How an LLM Agent Masters Protein Design for Specialized Scientific Tasks

Researchers introduce Agent Rosetta, an LLM-based agent integrated with the Rosetta software environment to automate complex protein design tasks. The agent achieves performance comparable to specialized ML models and human experts on canonical amino acids, and excels on non-canonical residues where standard ML fails. The study highlights the critical role of environment design in enabling LLM agents to operate specialized scientific software.

Jun 17, 2026 1 source

MuVAP: New AI Model Predicts Turn-Taking in Multiparty Conversations Using Audio and Video

Technology

Artificial Intelligence #voice activity projection#turn-taking prediction

MuVAP: New AI Model Predicts Turn-Taking in Multiparty Conversations Using Audio and Video

Researchers introduce MuVAP, a causal multimodal framework that predicts turn-taking in multiparty conversations using monaural audio and a single camera. The model extends Voice Activity Projection by grounding acoustic predictions in face tracks, and a new 31-hour corpus of unedited conversations supports training.

Jun 17, 2026 1 source

Research Proposes Task-Based Neurons to Enhance Neural Network Feature Representation

Technology

Artificial Intelligence #artificial intelligence#neural networks

Research Proposes Task-Based Neurons to Enhance Neural Network Feature Representation

A study published on arXiv introduces a framework for designing task-based neurons inspired by the human brain's neuronal diversity. Using polynomials as base functions, experiments on synthetic data, classic benchmarks, and real-world applications demonstrate competitive performance against state-of-the-art models.

Jun 16, 2026 1 source

New Framework TRACED Evaluates LLM Reasoning Using Geometric Stability and Progress

Technology

Artificial Intelligence #llms#reasoning

New Framework TRACED Evaluates LLM Reasoning Using Geometric Stability and Progress

A new research framework called TRACED evaluates LLM reasoning quality by analyzing geometric progress and stability of reasoning traces. It distinguishes correct reasoning from hallucinations based on trajectory patterns, offering a more robust evaluation method than scalar probabilities.

Jun 16, 2026 1 source

New Unified Definition of AI Hallucination Pins It on Inaccurate World Modeling

Technology

Artificial Intelligence #hallucination#artificial intelligence

New Unified Definition of AI Hallucination Pins It on Inaccurate World Modeling

A new arXiv paper by Liu et al. proposes a unified definition of hallucination in large language models, defining it as inaccurate internal world modeling observable to the user. The framework subsumes prior definitions and distinguishes true hallucinations from planning or reward errors, and introduces the HalluWorld benchmark for stress-testing models.

Jun 16, 2026 1 source

Attention, Not Model Scale, Drives Human-AI Alignment in Multimodal Language Prediction, Research Finds

Technology

Artificial Intelligence #attention#scale

Attention, Not Model Scale, Drives Human-AI Alignment in Multimodal Language Prediction, Research Finds

A study comparing five vision-language models with 600 human participants found that adding visual context significantly improved human-AI alignment in language prediction, with attention maps explaining up to 70% of inter-participant variance. The research indicates that attention to informative cues, not model scale, is the primary driver of alignment.

Jun 16, 2026 1 source

Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation

Technology

Artificial Intelligence #z-plane neural networks#bounded geometric activation

Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation

Researchers propose Z-Plane Neural Networks, which replace traditional ReLU activations and LayerNorm with a bounded geometric activation called Radial Bounding. This new approach maintains 1-Lipschitz continuity, prevents gradient vanishing, and preserves directional information. A 100-layer Z-Plane MLP achieved 98.34% accuracy on MNIST without any ReLU or LayerNorm, demonstrating numerical stability.

Jun 16, 2026 1 source

New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders

Technology

Artificial Intelligence #transformers#representation autoencoders

New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders

A new research paper proposes Drift-RAE, a method for distilling pretrained flow models in representation autoencoder latent spaces. It overcomes anisotropy and large curvature challenges, achieving 1.77 FID on ImageNet 256 with only 10,000 distillation steps, outperforming existing RAE distillation methods.

Jun 16, 2026 1 source

New Research Demystifies Variance in Circuit Discovery of Large Language Models

Technology

Artificial Intelligence #llms#circuit discovery

New Research Demystifies Variance in Circuit Discovery of Large Language Models

A new research paper explores variance in circuit discovery of large language models, identifying resampling, rephrasing, and sample-wise variance. The authors propose CEAP, an improved method over EAP-IG with theoretical guarantees, and argue that rephrasing variance makes it hard to find comprehensive circuits, suggesting LLMs may be inherently difficult to steer.

Jun 16, 2026 1 source

PISA Memory System Draws on Cognitive Psychology to Boost AI Agent Adaptability

Technology

Artificial Intelligence #pisa#unified memory system

PISA Memory System Draws on Cognitive Psychology to Boost AI Agent Adaptability

Researchers propose PISA, a pragmatic psych-inspired unified memory system for AI agents that treats memory as a constructive process. It introduces a trimodal adaptation mechanism and hybrid memory access architecture, achieving state-of-the-art results on LOCOMO and the new AggQA benchmark.

Jun 16, 2026 1 source

Gen-VCoT: New Framework Generates RGB Images as Visual Chain-of-Thought Intermediates for Multimodal AI Reasoning

Technology

Artificial Intelligence #generative ai#visual reasoning

Gen-VCoT: New Framework Generates RGB Images as Visual Chain-of-Thought Intermediates for Multimodal AI Reasoning

Researchers propose Gen-VCoT, a framework that generates RGB images as visual chain-of-thought intermediates, improving spatial reasoning by 25% and depth reasoning by 50% over baseline MLLMs, though text-based CoT remains superior for simple factual queries.

Jun 16, 2026 1 source

Open Science Gains Ground: 10-Year AI Study Shows Sharp Rise in Code and Data Sharing

Technology

Artificial Intelligence #open science#ai research

Open Science Gains Ground: 10-Year AI Study Shows Sharp Rise in Code and Data Sharing

A decade-long analysis of 56,800 AI conference papers shows documentation practices improving dramatically, with code and data sharing nearly sixfold from 11% to 64%. Estimated reproducibility also rose from 28% to 64%, improvements that predated formal reproducibility checklists.

Jun 16, 2026 1 source

New Visualization Framework Reveals Spatial Sources of Uncertainty in Deep Learning Models

Technology

Artificial Intelligence #deep learning#reinforcement learning

New Visualization Framework Reveals Spatial Sources of Uncertainty in Deep Learning Models

Researchers propose a novel framework called Uncertainty Activation Map (UAM) that visualizes two types of uncertainty – vacuity (lack of evidence) and dissonance (conflicting evidence) – at pixel level. Combining Evidential Deep Learning (EDL) with Full-Gradient Class Activation Mapping (FullGrad), UAM provides theoretically grounded spatial maps to help identify when and why deep neural networks are uncertain, a critical capability for deploying reliable AI in safety-critical domains.

Jun 16, 2026 2 sources

ReGrad: A New AI Paradigm for Continual Learning Without Catastrophic Forgetting

Technology

Artificial Intelligence #machine learning#continual learning

ReGrad: A New AI Paradigm for Continual Learning Without Catastrophic Forgetting

A new paper introduces ReGrad (Retrievable Gradients), a paradigm for continual post-training that pre-computes document-specific gradients, stores them in a Gradient Bank, and retrieves query-relevant gradients at inference time for temporary weight adaptation. The method uses bi-level meta-learning to reshape gradients into generalizable signals, outperforming CPT and RAG baselines in experiments.

Jun 16, 2026 1 source

VibeThinker-3B: Small Language Model Matches Giants in Verifiable Reasoning, According to arXiv Paper

Technology

Artificial Intelligence #vibethinker-3b#small language model

VibeThinker-3B: Small Language Model Matches Giants in Verifiable Reasoning, According to arXiv Paper

A new technical report on arXiv introduces VibeThinker-3B, a compact 3B-parameter language model that achieves verifiable reasoning scores comparable to models orders of magnitude larger, including DeepSeek V3.2, GLM-5, and Gemini 3 Pro. The model uses a Spectrum-to-Signal post-training paradigm and achieves 94.3 on AIME26 and 80.2% Pass@1 on LiveCodeBench v6.

Jun 16, 2026 1 source

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

Technology

Artificial Intelligence #visual representation learning#temporal differences

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

A new research paper introduces Temporal Difference in Vision (TDV), a self-supervised learning method that avoids strong inductive biases like augmentations or masking. TDV trains an image encoder and a motion encoder to predict the next frame, relying only on the causal assumption that the past causes the future. The method matches state-of-the-art on dense spatial tasks, suggesting a new paradigm for visual representation learning.

Jun 16, 2026 1 source

Vernier Research Reveals Why Language Models Give Inconsistent Answers to Causal Questions After Variable Renaming

Technology

Artificial Intelligence #artificial intelligence#causal reasoning

Vernier Research Reveals Why Language Models Give Inconsistent Answers to Causal Questions After Variable Renaming

Researchers introduce Vernier, a probing technique that reveals representational misalignment in instruction-tuned language models when variable names are replaced with placeholders, causing inconsistent answers to causal reasoning questions. The study tests models including Qwen-7B, Qwen-14B, and Llama-3.1-8B, and finds that success is bounded by model family, scale, and task.

Jun 16, 2026 1 source

AI Scientist Automates Entire Research Lifecycle, Passes First Peer Review

Technology

Artificial Intelligence #automation#ai research

AI Scientist Automates Entire Research Lifecycle, Passes First Peer Review

A new AI system called The AI Scientist can autonomously conduct the entire research lifecycle, from idea generation to manuscript writing and peer review. It produced a paper that passed the first round of peer review at a major machine learning conference workshop with a 70% acceptance rate. The system operates in both a focused mode using human-provided templates and a template-free open-ended mode.

Jun 16, 2026 1 source

New DAG-SHAP Method Improves Feature Attribution Using Edge Intervention in Directed Acyclic Graphs

Technology

Artificial Intelligence #feature attribution#directed acyclic graphs

New DAG-SHAP Method Improves Feature Attribution Using Edge Intervention in Directed Acyclic Graphs

Researchers introduce DAG-SHAP, a feature attribution method for directed acyclic graphs that uses edge intervention to address limitations of node-centric Shapley value approaches. The method captures both externality and exogenous influence, validated on real and synthetic datasets.

Jun 16, 2026 1 source

VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference

Technology

Artificial Intelligence #video anomaly detection#deformable attention

VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference

A new AI framework, VigilFormer, uses deformable attention and causal inference to detect anomalies in surveillance video at 41.5 FPS, outperforming prior methods on three benchmarks.

Jun 16, 2026 1 source

Think-at-Hard: Selective Latent Iterations Boost LLM Reasoning Accuracy by Up to 6.8%

Technology

Artificial Intelligence #artificial intelligence#large language models

Think-at-Hard: Selective Latent Iterations Boost LLM Reasoning Accuracy by Up to 6.8%

A new research paper proposes Think-at-Hard (TaH), a looped transformer that selectively performs latent iterations only on tokens likely to be incorrect. By skipping iterations on 93% of tokens, TaH outperforms always-iterate models by 3.8-4.4% and single-iteration baselines by up to 6.8%, while requiring negligible extra parameters.

Jun 16, 2026 1 source

PACT Hybrid Architecture Combines Small Language Model Planning with Reinforcement Learning for Enhanced Decision-Making

Technology

Artificial Intelligence #artificial intelligence#language models

PACT Hybrid Architecture Combines Small Language Model Planning with Reinforcement Learning for Enhanced Decision-Making

Researchers propose Plan, Align, Commit, Think (PACT), a hybrid architecture that couples a fast reactive reinforcement learning policy with a slow deliberative small language model (SLM) planner. The SLM asynchronously generates and validates action plans, which are executed directly once verified as safe through simulation. Evaluated on three FrozenLake configurations, PACT outperformed all baselines using a 2B-parameter SLM backbone, demonstrating that deliberative planning and reactive execution complement each other.

Jun 16, 2026 1 source

Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half

Technology

Artificial Intelligence #tied expert layers#mixture-of-experts

Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half

A new arXiv paper from Jaggi proposes Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers. Pretraining experiments show memory footprint reduction by almost 2x with virtually no degradation in perplexity or downstream quality, evaluated on OLMoE, Qwen3, and DeepSeek-style architectures.

Jun 16, 2026 1 source

Causal Model of Theory of Mind in Conflict Offers New Path for AI Mentalizing

Technology

Artificial Intelligence #artificial intelligence#theory of mind

Causal Model of Theory of Mind in Conflict Offers New Path for AI Mentalizing

A new research paper by Gurney and Nikolos introduces a structural causal model for theory of mind (ToM) in artificial intelligence, addressing the unresolved question of when mentalizing is warranted in conflict situations. The model treats ToM as a mechanism activated by situational and agent-level conditions, offering a resource-rational decision procedure for AI systems. It specifies four exogenous variables, five endogenous mediators, and three causal pathways leading to epistemic accuracy, with implications for efficiency, trust, and robust artificial social intelligence.

Jun 16, 2026 1 source