Home ›› Topics ›› agentic

Topic

agentic

10 stories

Artificial Intelligence #benchmark#academic search

ScholarQuest Benchmark Reveals Gaps in Agentic Academic Paper Search for Enterprise AI

A new benchmark called ScholarQuest evaluates LLM-based agents for academic paper search. Built from over 1,000 computer science topics and four research intents, it provides scalable answer construction and a shared retrieval backend. Results show agentic methods beat single-shot retrieval but the top agent only achieves 0.314 Recall@100, indicating significant room for improvement in agentic search.

Jul 8, 2026 1 source

Benchmarking Agentic Review Systems: AI Peer Review Achieves 83% Pairwise Accuracy but Falls Short on Error Detection

Technology

Artificial Intelligence #benchmarking#agentic

Benchmarking Agentic Review Systems: AI Peer Review Achieves 83% Pairwise Accuracy but Falls Short on Error Detection

A study by Nguyen et al. benchmarks two open-source and one proprietary AI review system on peer review tasks. The best configuration (OpenAIReview + GPT-5.5) achieves 83.0% pairwise accuracy in tracking paper quality but only 71.6% recall in detecting injected errors. User feedback shows a positive-to-negative vote ratio of 1.44:1, with common complaints about false positives. The research highlights both the potential and limitations of current AI agents in evaluation tasks.

Jul 8, 2026 1 source

Playful Agentic Robot Learning: Autonomous Skill Acquisition Through Self-Directed Play

Technology

Artificial Intelligence #robot#learning

Playful Agentic Robot Learning: Autonomous Skill Acquisition Through Self-Directed Play

A research paper presents Playful Agentic Robot Learning, where robots autonomously propose and practice tasks to build a skill library. The RATs system achieves significant gains on downstream tasks without fine-tuning the underlying model.

Jul 8, 2026 1 source

Agentic RAG Pipeline Achieves 96.5% Clinician Acceptance in Clinical Information Extraction

Technology

Artificial Intelligence #artificial intelligence#llm

Agentic RAG Pipeline Achieves 96.5% Clinician Acceptance in Clinical Information Extraction

Standard retrieval-augmented generation fails on clinical data due to missing metadata and cross-document dependencies. Researchers at University Medicine Essen deployed ACIE, an on-premise agentic RAG pipeline, that reasons over complete patient contexts and grounds answers in source passages. In an independent study with 7,326 clinician judgments, extractions were accepted 96.5% of the time, with per-type acceptance ranging from 80% to 99%.

Jun 20, 2026 1 source

ENPIRE Framework Enables Autonomous Robot Policy Self-Improvement for Real-World Manipulation Tasks

Technology

Artificial Intelligence #ai#robotics

ENPIRE Framework Enables Autonomous Robot Policy Self-Improvement for Real-World Manipulation Tasks

Researchers introduce ENPIRE, a harness framework that enables coding agents to autonomously improve robot policies through a closed-loop feedback routine. The system achieved 99% success rate on challenging manipulation tasks including organizing a pin box and fastening a zip tie. This approach minimizes human effort in real-world robotics training.

Jun 20, 2026 1 source

AgenticRec: A Recommender Framework That Aligns LLM Reasoning with User Preferences

Technology

Artificial Intelligence #agentic#recommendation

AgenticRec: A Recommender Framework That Aligns LLM Reasoning with User Preferences

Researchers propose AgenticRec, a framework that treats recommendation as a tool-integrated reasoning process. It employs a two-stage training paradigm to overcome misalignment between LLM reasoning trajectories and recommendation feedback, improving fine-grained preference distinction.

Jun 16, 2026 1 source

New Benchmark IRTS-ToolBench Tests LLMs on Irregular Time Series Question Answering

Technology

Artificial Intelligence #ai#artificial intelligence

New Benchmark IRTS-ToolBench Tests LLMs on Irregular Time Series Question Answering

A research paper introduces IRTS-ToolBench, a benchmark of 1,700 questions spanning 10 task types across 13 domains to evaluate large language models (LLMs) and AI agents on irregular time series question answering (TSQA). The benchmark addresses a gap in existing TSQA benchmarks that assume regular sampling, providing standardized inputs and a reproducible evaluation protocol for verifiable agentic data science.

Jun 16, 2026 2 sources

New Framework Automates Skill Construction for Agentic Large Language Models

Technology

Artificial Intelligence #openclaw-skill#collective skill tree search

New Framework Automates Skill Construction for Agentic Large Language Models

A new framework called Collective Skill Tree Search (CSTS) automatically constructs reusable skills for large language model (LLM) agents. It uses two iterative phases—collective generation and collective assessment—to build a diverse, generalizable tree of skills that enhances agentic capabilities in planning, tool use, and environment interaction.

Jun 16, 2026 1 source

MAGE-RAG: Multigranular Adaptive Graph Evidence Framework Improves Long-Document Multimodal QA Accuracy

Technology

Artificial Intelligence #mage-rag#multimodal

MAGE-RAG: Multigranular Adaptive Graph Evidence Framework Improves Long-Document Multimodal QA Accuracy

The MAGE-RAG research paper introduces a multigranular adaptive graph evidence framework for multimodal retrieval-augmented generation (RAG) in long-document question answering. By building an evidence graph with page and element nodes and using an online controller to iteratively activate and prune evidence, it balances coverage and noise. Experiments show accuracy improvements over existing methods on LongDocURL and MMLongBench-Doc benchmarks.

Jun 16, 2026 1 source

Visual-Seeker: Visual-Native AI Agent for Active Visual Reasoning in Multimodal Search

Technology

Artificial Intelligence #visual reasoning#multimodal

Visual-Seeker: Visual-Native AI Agent for Active Visual Reasoning in Multimodal Search

Researchers propose Visual-Seeker, a visual-native multimodal deep search agent that actively harvests fine-grained visual evidence during search. Using a synthesized dataset of 5K multimodal trajectories, it achieves state-of-the-art on five benchmarks, outperforming several proprietary models.

Jun 16, 2026 1 source