iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
CHILLGuard: Fine-Grained Chinese LLM Safety Guardrail with Scalable Data and Preference Alignment Minimal Oversight Principle Offers Computable Governance for Delegated AI Systems GMS returns all four evacuated liftboats to Persian Gulf on same contracts UK and Japan Sign £9bn Offshore Wind Investment Pact for 5.9GW Floating Projects Euroseas Expands Feeder Containership Orderbook with Two Additional 1,800 TEU Vessels RECTOR Framework Sets New State-of-the-Art in EEG Emotion Recognition and sEEG Classification Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning Models LLaMA 3.1's Ethical Reasoning Reveals Frame-Conditioned Moral Computation, Researchers Find New Survey Unifies LLM Policy Optimization Methods on First Principles from REINFORCE to GRPO Neuro-Symbolic Framework Improves Motion Prediction for Autonomous Vehicles in Mixed Traffic CHILLGuard: Fine-Grained Chinese LLM Safety Guardrail with Scalable Data and Preference Alignment Minimal Oversight Principle Offers Computable Governance for Delegated AI Systems GMS returns all four evacuated liftboats to Persian Gulf on same contracts UK and Japan Sign £9bn Offshore Wind Investment Pact for 5.9GW Floating Projects Euroseas Expands Feeder Containership Orderbook with Two Additional 1,800 TEU Vessels RECTOR Framework Sets New State-of-the-Art in EEG Emotion Recognition and sEEG Classification Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning Models LLaMA 3.1's Ethical Reasoning Reveals Frame-Conditioned Moral Computation, Researchers Find New Survey Unifies LLM Policy Optimization Methods on First Principles from REINFORCE to GRPO Neuro-Symbolic Framework Improves Motion Prediction for Autonomous Vehicles in Mixed Traffic
Home ›› Technology ›› Ai ›› Llms ›› StateGen Platform Generates Synthetic Training Data for Tool-Augmented LLMs with 9.66/10 Hallucination Score

StateGen Platform Generates Synthetic Training Data for Tool-Augmented LLMs with 9.66/10 Hallucination Score

Researchers introduce StateGen, a synthetic data generation platform that produces scored, reasoning-trace-rich training conversations for tool-augmented LLMs. The platform uses a four-role LLM loop and an authoritative state manager to eliminate tool-call hallucinations, achieving a 9.66/10 score across 64,698 evaluated conversations.

iG
iGEN Editorial
June 16, 2026
StateGen Platform Generates Synthetic Training Data for Tool-Augmented LLMs with 9.66/10 Hallucination Score

Enterprise teams building AI agents that interact with external tools face a chronic shortage of high-quality training data. Manual annotation is expensive, production data carries privacy risks, and public datasets rarely capture multi-turn tool use. According to a paper published on arXiv, researchers have developed StateGen, a synthetic data generation platform designed to address this gap.

StateGen orchestrates a four-role LLM loop: a persona-conditioned user simulator, an agent under test, a state-grounded tool simulator, and a multi-axis LLM judge. The architectural core is an authoritative state manager that maintains a structured world-state object across conversation turns. The paper describes this as enforcing a "backend-is-truth" invariant, which by construction eliminates the dominant class of tool-call hallucinations.

How StateGen Works

The platform produces scored, reasoning-trace-rich training conversations. The four roles interact as follows:

  • Persona-conditioned user simulator: Generates diverse user queries based on a 23-dimensional trait vector, enabling persona-driven variation.
  • Agent under test: The LLM being trained to use tools.
  • State-grounded tool simulator: Simulates tool responses based on the shared state object, ensuring consistency.
  • Multi-axis LLM judge: Evaluates the conversation on multiple criteria, providing a score.

StateGen extends naturally to hierarchical multi-agent settings by declaring sub-agents as tools, all sharing the same state object. This allows the platform to generate data for complex workflows where multiple agents collaborate.

Performance and Evaluation

The researchers reported results on 64,698 evaluated conversations across three production corpora. Key metrics include:

Metric Value
Tool-call hallucination score 9.66 / 10
Persona trait vector dimensions 23
Evaluated conversations 64,698
External systems compared 8

A cleanly separated train and golden evaluation set split confirmed that the generated data is not memorization bait, as shown by per-criterion gap analysis.

Comparison with Existing Platforms

According to the paper, comparison with eight external systems revealed that no single publicly available platform combines multi-turn generation, state-grounded tool simulation, hierarchical multi-agent support, and built-in judge scoring. StateGen unifies all these capabilities in one platform.

Implications for Enterprise AI

For organizations developing tool-augmented LLMs for supply chain, logistics, or trade applications, StateGen offers a way to generate large volumes of realistic training data without exposing sensitive production data. The platform's ability to produce scored conversations with reasoning traces could accelerate the development of AI agents that reliably interact with APIs, databases, and enterprise systems. The 23-dimensional persona vector also allows fine-grained control over user behavior, enabling the simulation of diverse scenarios that reflect real-world usage patterns.


Sources:

Keep Reading

Recommended Stories

AgentLeak Benchmark Reveals Internal Channel Privacy Leaks in Multi-Agent LLM Systems Technology

AgentLeak Benchmark Reveals Internal Channel Privacy Leaks in Multi-Agent LLM Systems

A new benchmark called AgentLeak evaluates privacy leakage in multi-agent large language model (LLM) systems, finding that inter-agent messages leak at 68.8% compared to 27.2% for final outputs. Across 1,000 scenarios and five models, total system exposure reaches 68.9%, highlighting risks invisible to standard output-only audits.

June 16, 2026
ChatPlanner: LLM Framework Personalizes Public Transit Routing with Fine-Tuning and RAG Technology

ChatPlanner: LLM Framework Personalizes Public Transit Routing with Fine-Tuning and RAG

Researchers present ChatPlanner, a novel framework that leverages fine-tuned Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) to capture diverse user preferences for public transit routing. The system extracts routing parameters from natural language queries, integrates preferences into the routing algorithm, and generates feasible, personalized alternatives. Three experiments show that the combined fine-tuning and RAG approach achieves highest accuracy and uncovers valuable solutions overlooked by existing route planners.

June 16, 2026
SpecAlign Framework Uses Synthetic Data to Align Large Language Models with Specific Policies Technology

SpecAlign Framework Uses Synthetic Data to Align Large Language Models with Specific Policies

A research paper introduces SpecAlign, a framework that generates synthetic training data from provider-authored model specifications to align large language models with specific policies. The method combines structured rule annotation, controllable instantiation, and multi-agent adversarial data synthesis to create preference pairs for fine-tuning. Experiments show improved rule compliance without sacrificing general capabilities.

June 16, 2026
New Framework Automates Skill Construction for Agentic Large Language Models Technology

New Framework Automates Skill Construction for Agentic Large Language Models

A new framework called Collective Skill Tree Search (CSTS) automatically constructs reusable skills for large language model (LLM) agents. It uses two iterative phases—collective generation and collective assessment—to build a diverse, generalizable tree of skills that enhances agentic capabilities in planning, tool use, and environment interaction.

June 16, 2026