theorem proving

5 stories

Artificial Intelligence #reinforcement learning#theorem proving

Process-Verified Reinforcement Learning for Theorem Proving via Lean: A New Path to AI Reliability

A new arXiv preprint presents process-verified reinforcement learning for theorem proving, using the Lean proof assistant as a symbolic process oracle. By parsing proof attempts into tactic sequences and leveraging Lean's type-theoretic feedback, the method delivers dense, verifier-grounded credit signals. Experiments with STP-Lean and DeepSeek-Prover-V1.5 show tactic-level supervision outperforms outcome-only baselines on MiniF2F and ProofNet benchmarks.

Jul 8, 2026 2 sources

SorryDB Benchmark Tests AI Provers on Real-World Lean Theorem Completion Tasks

Technology

Artificial Intelligence #ai#theorem proving

SorryDB Benchmark Tests AI Provers on Real-World Lean Theorem Completion Tasks

Researchers present SorryDB, a benchmark of open Lean tasks from 78 GitHub projects. Evaluating a snapshot of 1000 tasks, they show current approaches are complementary, with Gemini Flash-based agentic methods leading but not outperforming all others.

Jun 17, 2026 1 source

MA-ProofBench: New Benchmark Tests LLMs on Formal Theorem Proving in Mathematical Analysis

Technology

Artificial Intelligence #large language models#theorem proving

MA-ProofBench: New Benchmark Tests LLMs on Formal Theorem Proving in Mathematical Analysis

Researchers introduce MA-ProofBench, the first formal theorem-proving benchmark dedicated to mathematical analysis. It contains 200 theorems across six topics at two difficulty levels. Evaluations show that even the best model, GPT-5.5, achieves only 16% Pass@8 on undergraduate-level problems and 5% on Ph.D.-level problems, highlighting significant limitations of current LLMs in formal mathematical reasoning.

Jun 16, 2026 1 source

LLMs Struggle with Multi-Step Logic: New Framework DREAM Boosts Theorem Proving Performance

Technology

Artificial Intelligence #llms#mathematical reasoning

LLMs Struggle with Multi-Step Logic: New Framework DREAM Boosts Theorem Proving Performance

Large language models (LLMs) have shown promise in mathematical reasoning but struggle with multi-step first-order logic (FOL) tasks. A new paper introduces DREAM, a self-adaptive solution that enhances diversity and reasoning of generation strategies, improving performance by up to 6.4% on a dataset of 447 theorems.

Jun 16, 2026 1 source

Study Reveals Serious Robustness Flaws in Proof Autoformalization for Lean 4

Technology

Software #lean 4#proof autoformalization

Study Reveals Serious Robustness Flaws in Proof Autoformalization for Lean 4

A new arXiv preprint presents the first systematic study on the robustness of proof autoformalization in Lean 4, introducing a benchmark with global and local perturbations. Evaluating seven recent LLM-based models on miniF2F and MATH-500, the study finds all are sensitive to global paraphrasing and mostly fail to faithfully reflect local changes, raising concerns for dependable formal verification.

Jun 16, 2026 1 source