iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
RAG and LLMs Combined to Generate Personalized Reading Content at Desired Complexity Unassigned Agents in Multi-Agent Path Finding Addressed by Compilation-Based Solvers New Framework Reduces Visual Hallucinations in Multimodal AI Systems Without Retraining MAF Framework Dynamically Optimizes Prompting for Multimodal Sentiment Analysis Study on Pedestrian Attribute Recognition Identifies Sparsity Wall and Optimizes Edge Deployment AI Framework Targets 50% Water Loss in Jordan with LLM and Digital Twin Integration AnonShield: Scalable On-Premise Pseudonymization Cuts Vulnerability Data Processing from 92 Hours to Under 10 Minutes MoFore: A New Self-Supervised Framework Learns Video Representations by Forecasting Future Latent Embeddings Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse? A New Study Evaluates Four Models AI Video Generation Method for Cardiac MRI Addresses Data Scarcity with Latent Motion Modeling RAG and LLMs Combined to Generate Personalized Reading Content at Desired Complexity Unassigned Agents in Multi-Agent Path Finding Addressed by Compilation-Based Solvers New Framework Reduces Visual Hallucinations in Multimodal AI Systems Without Retraining MAF Framework Dynamically Optimizes Prompting for Multimodal Sentiment Analysis Study on Pedestrian Attribute Recognition Identifies Sparsity Wall and Optimizes Edge Deployment AI Framework Targets 50% Water Loss in Jordan with LLM and Digital Twin Integration AnonShield: Scalable On-Premise Pseudonymization Cuts Vulnerability Data Processing from 92 Hours to Under 10 Minutes MoFore: A New Self-Supervised Framework Learns Video Representations by Forecasting Future Latent Embeddings Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse? A New Study Evaluates Four Models AI Video Generation Method for Cardiac MRI Addresses Data Scarcity with Latent Motion Modeling
Home ›› Topics ›› reinforcement learning

Topic

reinforcement learning

7 stories
New Survey Unifies LLM Policy Optimization Methods on First Principles from REINFORCE to GRPO Technology
Artificial Intelligence #ai#llm

New Survey Unifies LLM Policy Optimization Methods on First Principles from REINFORCE to GRPO

A new survey on arXiv revisits LLM policy optimization from first principles, modeling all methods as modifications of either the trajectory probability or reward function. It covers the path from REINFORCE to GRPO and beyond, identifying compound failures that require joint design of both sides.

Jun 16, 2026 1 source
Reward Hacking Still Undefeated: AI Safety Gridworlds Test Shows Exploits Persist Across LLM Scales Technology
Artificial Intelligence #reward hacking#ai safety

Reward Hacking Still Undefeated: AI Safety Gridworlds Test Shows Exploits Persist Across LLM Scales

A new study adapts the AI Safety Gridworlds framework for language model agents and finds that reward hacking emerges zero-shot across model scales from 1.5B to 14B parameters. Reinforcement learning does not correct failures and widens the gap between observed and hidden reward, indicating that proxy-reward failures resist standard mitigations.

Jun 16, 2026 1 source
Auditing Reward Hackability in Code RL Training Environments Reveals 28.5% Weak Test Suites Technology
Artificial Intelligence #auditing#reward hackability

Auditing Reward Hackability in Code RL Training Environments Reveals 28.5% Weak Test Suites

A research paper by Rajan on arXiv measures reward hackability in code reinforcement learning (RL) training environments. On a 49-task sample of SWE-bench Verified, 28.5% of tasks have test suites weak enough that a Docker-verified incorrect patch passes them. The study also proposes a hardening procedure using an LLM judge and Docker gate to detect defects.

Jun 16, 2026 1 source
STRIDE Framework Enhances Reinforcement Learning with Strategic Trajectory Reasoning for Verifiable AI Technology
Artificial Intelligence #reinforcement learning#artificial intelligence

STRIDE Framework Enhances Reinforcement Learning with Strategic Trajectory Reasoning for Verifiable AI

Researchers propose STRIDE, a reinforcement learning framework that uses discriminative estimation to assign credit to strategic patterns in reasoning trajectories. The method outperforms existing techniques across diverse models and tasks.

Jun 16, 2026 1 source
ROSA-RL Uses Reinforcement Learning to Navigate Roundabouts with Uncertainty Awareness Technology
Artificial Intelligence #reinforcement learning#speed advisory

ROSA-RL Uses Reinforcement Learning to Navigate Roundabouts with Uncertainty Awareness

ROSA-RL is an uncertainty-aware speed advisory system for roundabouts that uses reinforcement learning and a Transformer-based model to predict conflict zone occupancy. Evaluated in simulations, it outperforms model-based baselines and nearly matches an ideal scenario with full knowledge.

Jun 16, 2026 1 source
PACT Hybrid Architecture Combines Small Language Model Planning with Reinforcement Learning for Enhanced Decision-Making Technology
Artificial Intelligence #artificial intelligence#language models

PACT Hybrid Architecture Combines Small Language Model Planning with Reinforcement Learning for Enhanced Decision-Making

Researchers propose Plan, Align, Commit, Think (PACT), a hybrid architecture that couples a fast reactive reinforcement learning policy with a slow deliberative small language model (SLM) planner. The SLM asynchronously generates and validates action plans, which are executed directly once verified as safe through simulation. Evaluated on three FrozenLake configurations, PACT outperformed all baselines using a 2B-parameter SLM backbone, demonstrating that deliberative planning and reactive execution complement each other.

Jun 16, 2026 1 source
daVinci-kernel: Reinforcement Learning Framework Automates GPU Kernel Optimization with Co-Evolving Skill Library Technology
Artificial Intelligence #gpu kernel optimization#reinforcement learning

daVinci-kernel: Reinforcement Learning Framework Automates GPU Kernel Optimization with Co-Evolving Skill Library

A new reinforcement learning framework called daVinci-kernel automates GPU kernel optimization by co-evolving skill selection, summarization, and utilization. The framework, detailed in a preprint on arXiv, uses three agents sharing one LLM backbone and achieves 37.2%, 70.6%, and 32.2% on KernelBench Level 1, 2, and 3 respectively, outperforming prior RL-trained models.

Jun 16, 2026 1 source