Imperfect Visual Verifiers Boost LLM Code Customization, Study on TikZ Finds

A new study explores using imperfect visual verifiers for iterative refinement in LLM-based code customization of TikZ graphics. Despite the lack of a deterministic oracle, imperfect verifiers achieved F1-scores up to 0.815, significantly improving customization success for weaker models and providing stable gains for stronger ones.

iGEN Editorial

June 16, 2026

Imperfect Visual Verifiers Boost LLM Code Customization, Study on TikZ Finds

A challenge in code generation is customizing existing programs that produce visual outputs, such as TikZ—a graphics language for LaTeX. Unlike generating code from scratch, editing requires localized, semantics-preserving changes. A recent empirical study from arXiv investigates whether iterative refinement can remain effective when the verifier providing feedback is itself unreliable.

Researchers evaluated multiple LLM-based and tool-augmented visual verifiers within iterative refinement pipelines on TikZ code customization tasks. They defined visual code customization as an iterative editing problem with an imperfect oracle and manually annotated refinement trajectories to assess verifier behavior and feedback quality.

Key findings include:

Metric	Value
Verifier accuracy (F1-score)	Up to 0.815
Improvement for Qwen3-vl-30b-a3b-Instruct	+11 to +20 perfect customizations
Improvement for Gemini-3	+5 perfect customizations
Benefit of accurate verification for strong models	Prevents premature acceptance

The study used TikZ as a case study because it isolates core difficulties: weak code structure, fine-grained visual semantics, and difficult feature localization. The researchers found that feedback is effective only when it precisely identifies image issues, provides actionable guidance, addresses all relevant problems, and remains grounded in the original instruction.

While stronger models like Gemini-3 gained fewer absolute improvements (+5) compared to weaker models, they benefited more from accurate verification that prevented premature acceptance of incomplete edits. For the weaker model Qwen3-vl-30b-a3b-Instruct, imperfect verifiers added between 11 and 20 perfect customizations.

The study's authors—Charly Reux, Mathieu Acher, Djamel Eddine Khelladi, Clément Quinton, and Olivier Barais—conducted a large-scale evaluation of multiple LLM-based and tool-augmented visual verifiers within iterative refinement pipelines. They emphasized that even imperfect verifiers can determine with moderate accuracy whether visual instructions are applied to code.

For enterprise technology leaders dealing with automated documentation or graphics generation—such as supply chain diagrams or product illustrations—this research suggests that imperfect verification can still be a practical tool. Instead of requiring perfect automated checks, organizations can leverage iterative refinement with fallible verifiers to improve code customization outcomes, especially when using less capable models.

The paper "Imperfect Visual Verification for Code Edition: A Case Study on TikZ" is available on arXiv. The findings indicate that imperfect verifiers, while not perfect, can significantly boost the effectiveness of LLM-based code editing for visual programs.

Sources:

Imperfect Visual Verifiers Boost LLM Code Customization, Study on TikZ Finds

Recommended Stories

Telegram Blocked in India for NEET Exam, But Remains Accessible via VPN

LM-SPT Uses Semantic Distillation to Improve Speech Tokenization for Language Models

New Research Reveals Distinct Training Dynamics of On-Policy Distillation for Large Language Models

Study Finds Hybrid CNN-Clay Model Improves Landslide Detection Accuracy Over Baseline