Phase-Aware Guidance Injection Boosts Recurrent MAPPO for Assembly-Line Disruption Recovery

Researchers propose a phase-aware guidance injection framework for recurrent MAPPO in assembly-line disruption recovery. The framework allows decision-time integration of heterogeneous recovery hints without redesigning the actor. Experiments show high-quality rule guidance yields strongest gains, while LLM guidance offers intermediate improvements.

iGEN Editorial

June 16, 2026

Phase-Aware Guidance Injection Boosts Recurrent MAPPO for Assembly-Line Disruption Recovery

Disruption recovery in industrial assembly lines demands rapid decisions in response to machine faults, worker absences, and emergency orders. Traditional approaches rely on rigid handcrafted recovery logic or adaptive policies that learn from data but cannot readily exploit diverse external recovery knowledge at the moment of decision. This gap often leads to prolonged abnormal recovery time (ART) and risks to on-time delivery (OTD).

Researchers from multiple institutions have introduced a phase-aware guidance injection framework that augments a trained recurrent Multi-Agent Proximal Policy Optimization (RMAPPO) scheduling policy through logit-level action bias during evaluation. The framework provides a unified decision-time interface for integrating heterogeneous recovery hints—rule-based, replay-based, and online LLM-based guidance—and strategically activates intervention only during abnormal and recovery phases of the assembly line.

The Problem of Assembly-Line Disruption

Assembly lines are complex multi-agent systems where coordinated scheduling is critical. When disruptions occur—such as a machine fault or worker absence—recovery actions must be taken quickly to minimize downtime. Existing methods either depend on pre-programmed recovery rules that lack adaptability or on reinforcement learning policies that cannot ingest external knowledge at decision time. The result is suboptimal recovery performance, increased abnormal recovery time, and reduced on-time delivery.

Phase-Aware Guidance Injection Framework

The proposed framework builds on recurrent MAPPO, a multi-agent reinforcement learning algorithm that uses recurrent neural networks to handle partially observable environments. During evaluation, the framework biases the action logits of the policy using guidance from external sources. This logit-level injection allows the policy to incorporate domain expertise or historical recovery data without needing to retrain or redesign the actor network.

Importantly, the framework is phase-aware: it activates guidance injection only during abnormal and recovery phases, leaving normal operations untouched. This targeted intervention prevents unnecessary bias during steady-state production.

Three Guidance Modes

The framework supports three distinct types of guidance:

Rule-based guidance: Uses handcrafted recovery rules drawn from domain expertise. According to the paper, this mode yields the strongest performance gains.
Replay-based guidance: Leverages past successful recovery episodes. Performance degrades smoothly when the availability of relevant replays is imperfect.
Online LLM guidance: Employs a large language model to generate recovery suggestions in real time. This mode provides useful intermediate improvements, bridging the gap between rule-based and learning-only approaches.

Guidance Mode	Performance	Key Characteristic
Rule-based	Strongest gains	High-quality domain rules
Replay-based	Degrades smoothly	Imperfect replay availability
Online LLM	Intermediate improvements	No prior knowledge required

Experimental Results on AssemblyLineEnv

The researchers conducted experiments on a custom environment called AssemblyLineEnv, designed to simulate realistic assembly-line disruptions. Results demonstrated that decision-time guidance injection enables the policy to exploit heterogeneous recovery hints effectively without altering the learned actor network. High-quality rule guidance produced the strongest improvement in ART and OTD metrics, while replay-based guidance maintained robust performance even with incomplete data. Online LLM guidance, despite its lack of task-specific training, still delivered valuable intermediate gains.

The authors conclude that this approach can be readily applied to existing RMAPPO systems, offering a practical pathway for manufacturers to incorporate external knowledge into disruption recovery without costly retraining.

For enterprise decision-makers evaluating AI in supply chain and manufacturing, the phase-aware guidance injection framework represents a method to combine the adaptability of reinforcement learning with the reliability of expert knowledge. It addresses the practical challenge of integrating diverse recovery strategies—from manual rules to LLM-generated plans—into a single, coherent scheduling system. While the paper focuses on assembly lines, the underlying technique of logit-level injection could extend to other multi-agent coordination problems in logistics, warehousing, and freight operations.

Sources:

Phase-Aware Guidance Injection Boosts Recurrent MAPPO for Assembly-Line Disruption Recovery

The Problem of Assembly-Line Disruption

Phase-Aware Guidance Injection Framework

Three Guidance Modes

Experimental Results on AssemblyLineEnv

Recommended Stories

Sensor-Conditioned Representation Learning Uses Scene-Relevant Observation Quotients to Improve Latent Geometry

ViTaL Framework Combines Vision and Touch to Boost Robot Manipulation Success by 51%

Neuro-Symbolic Framework Improves Motion Prediction for Autonomous Vehicles in Mixed Traffic

MimicIK Framework Achieves Real-Time Inverse Kinematics with 4.65 mm Accuracy for Robotic Teleoperation