daVinci-kernel: Reinforcement Learning Framework Automates GPU Kernel Optimization with Co-Evolving Skill Library

A new reinforcement learning framework called daVinci-kernel automates GPU kernel optimization by co-evolving skill selection, summarization, and utilization. The framework, detailed in a preprint on arXiv, uses three agents sharing one LLM backbone and achieves 37.2%, 70.6%, and 32.2% on KernelBench Level 1, 2, and 3 respectively, outperforming prior RL-trained models.

iGEN Editorial

June 16, 2026

daVinci-kernel: Reinforcement Learning Framework Automates GPU Kernel Optimization with Co-Evolving Skill Library

Enterprises deploying large-scale AI and high-performance computing (HPC) workloads face a critical bottleneck: optimizing GPU kernels for peak performance. Manual kernel tuning is labor-intensive and requires deep expertise. A new reinforcement learning framework, daVinci-kernel, aims to automate this process by dynamically building and exploiting a skill library. According to a preprint on arXiv, daVinci-kernel jointly trains three agents in a co-evolution loop: a Skill Selection Agent, a Policy Agent, and a Skill Summary Agent.

How daVinci-kernel Works

DaVinci-kernel operates through a cooperative multi-agent system sharing a single LLM backbone. The Skill Selection Agent retrieves relevant techniques using BM25 and LLM reranking. The Policy Agent generates multi-turn CUDA or Triton kernels conditioned on the selected skills. The Skill Summary Agent distills successful rollouts into reusable skills. Crucially, candidate skills are only added after execution-based verification confirms reproducible speedups. The framework is initialized via a structured supervised fine-tuning (SFT) cold start on diversity-filtered data, then jointly optimized end-to-end with multi-turn REINFORCE and per-agent advantage estimation.

Performance on KernelBench

The researchers evaluated daVinci-kernel-14B on the KernelBench benchmark. Under the Fast$_1$ threshold, the model achieved:

Benchmark Level	Success Rate
Level 1	37.2%
Level 2	70.6%
Level 3	32.2%

According to the preprint, daVinci-kernel-14B outperformed the strongest prior RL-trained model, this http URL-14B, across all three levels.

Implications for Enterprise AI Infrastructure

While currently focused on GPU kernel optimization, the daVinci-kernel framework demonstrates a generalizable approach to automated code optimization. For enterprise technology teams, this could reduce the time and expertise required to optimize CUDA and Triton kernels for specific hardware configurations. The use of a dynamically evolving skill library that is verified through execution ensures that only proven optimizations are retained. This approach could be extended to other domains where functional correctness is assumed but execution efficiency is the objective, such as database query optimization or network packet processing.

The preprint, authored by Fu, Dayuan, Jiang, Mohan, Wang, Tongyu, Yang, Dian, Hu, Jiarui, Liu, Liming, Hou, Jinlong, and Pengfei, is available on arXiv under the title "daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization."

Sources:

daVinci-kernel: Reinforcement Learning Framework Automates GPU Kernel Optimization with Co-Evolving Skill Library

How daVinci-kernel Works

Performance on KernelBench

Implications for Enterprise AI Infrastructure

Recommended Stories

StarOR: New AI Framework Combines Tree Search and Reinforcement Learning for Optimization Modeling

Reinforcement Learning Foundation Models: Synthetic MDPs Could Bridge the Gap

MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation

Residual-Space Evolutionary Optimization via Flow-based Generative Models