Large language models (LLMs) have demonstrated remarkable progress in software engineering and code generation, but applying them to hardware design—specifically RTL (Register Transfer Level) code generation—remains challenging due to the need for high-quality training samples. A new research paper, "LLM4RTL: Tool-Assisted LLM for RTL Generation" (arXiv, June 2026), introduces a system that combines a novel data refinement pipeline with tool-assisted architecture to improve RTL code generation performance significantly.
The JRCRC Pipeline
The authors propose a "judge-renew-check-renew-check" (JRCRC) pipeline that iteratively updates a public dataset using a hierarchy of state-of-the-art commercial LLM models. These models differ in cost and capability for RTL code generation. According to the paper, the pipeline provides a cost-effective mechanism for filtering and refining code-generation samples into a higher-quality training dataset. This approach enables the creation of specialized LLM systems through fine-tuning or low-rank adaptation without requiring massive manual curation.
Weaknesses and Tool-Assisted Architecture
Experiments conducted by the research team identified common weaknesses of LLMs in rule-based reasoning and logic, which are critical for RTL code generation. To address these shortcomings, the authors developed an architecture that incorporates pre-processing tools to dynamically assist the LLM in inferring logical relationships from tabular data formats. This tool-assisted approach mitigates the identified weaknesses and enhances the model's ability to generate correct Verilog code.
Performance Results
With the JRCRC pipeline and tool-assisted architecture, LLM4RTL achieves significant overall performance gains on the VerilogEval benchmark. The system outperforms many state-of-the-art methods and delivers performance comparable to GPT-4O while using a significantly smaller LLM. The following table summarizes the key comparison:
| Metric | LLM4RTL | GPT-4O |
|---|---|---|
| Model Size | Significantly smaller | Larger |
| VerilogEval Performance | Comparable | Baseline |
| Pipeline | JRCRC + tool-assisted | None (standard) |
Implications for Hardware Design Automation
LLM4RTL demonstrates that targeted pipeline design and tool integration can unlock LLM capabilities for specialized domains like chip design. By combining a data refinement pipeline with pre-processing tools that address logical reasoning gaps, the system achieves high performance without relying on the largest available models. This has practical implications for cost and deployment in electronic design automation (EDA) workflows.
The research was conducted by Jing, Chu, Robert, Yan, Ning, Mortazavi, and Masood S., and published on arXiv under the Computer Science > Hardware Architecture category. The full paper is available for review.