PACT Hybrid Architecture Combines Small Language Model Planning with Reinforcement Learning for Enhanced Decision-Making

Researchers propose Plan, Align, Commit, Think (PACT), a hybrid architecture that couples a fast reactive reinforcement learning policy with a slow deliberative small language model (SLM) planner. The SLM asynchronously generates and validates action plans, which are executed directly once verified as safe through simulation. Evaluated on three FrozenLake configurations, PACT outperformed all baselines using a 2B-parameter SLM backbone, demonstrating that deliberative planning and reactive execution complement each other.

iGEN Editorial

June 16, 2026

PACT Hybrid Architecture Combines Small Language Model Planning with Reinforcement Learning for Enhanced Decision-Making

Reinforcement learning (RL) policies often degrade when deployed in unfamiliar environments because they lack explicit deliberation. To address this, researchers have introduced PACT (Plan, Align, Commit, Think), a hybrid architecture that combines a fast, reactive RL policy with a slow, deliberative Small Language Model (SLM) planner.

Architecture: Dual-System Decision-Making

According to the paper, PACT invokes the SLM asynchronously to generate and validate candidate action plans. The SLM operates in a deliberative mode, producing plans that are then verified through simulation as safe, feasible, and complete. Once a plan passes verification, it is executed directly, bypassing the RL policy entirely. This design does not require retraining or modifying the existing RL policy, allowing for seamless integration.

The SLM backbone used in the experiments is a 2-billion-parameter model, which provides the deliberative reasoning necessary for complex planning tasks.

Evaluation on FrozenLake

The researchers evaluated PACT on three configurations of the FrozenLake environment, each of increasing difficulty. FrozenLake is a classic grid-world problem where an agent must navigate from start to goal while avoiding holes. The results showed that PACT outperformed all baselines across the tested configurations.

"Deliberative planning and reactive execution are more powerful in concert than either is alone in these settings."

The study highlights that the combination of fast reactive responses and slow, deliberative planning enables the system to handle unfamiliar situations where pure RL policies would typically fail.

Implications for Autonomous Systems

While the research was conducted in a simulated environment, the PACT architecture has potential applications for autonomous systems that require both immediate reaction and long-term planning. For example, in robotics or automated control, a system could use a reactive policy for routine operations while invoking the SLM planner when encountering novel or uncertain conditions. The asynchronous invocation means the deliberative process does not slow down real-time responses, as the SLM runs in parallel.

Key Components of PACT

Plan: The SLM generates candidate action plans based on the current state.
Align: Plans are aligned with the environment's constraints and goals.
Commit: A plan is committed only after verification through simulation.
Think: The system continuously refines its planning through deliberation.

The architecture is designed to be modular, allowing the RL policy and SLM to operate independently while sharing a common interface for plan execution.

Conclusion

The PACT approach demonstrates that hybrid architectures combining fast reactive policies with slow deliberative models can achieve superior performance in complex decision-making tasks. By leveraging a small language model for planning, the system benefits from the reasoning capabilities of language models without the computational overhead of larger models. This research opens up avenues for integrating language model deliberation into reinforcement learning systems for real-world applications where reliability and adaptability are critical.

Sources:

PACT Hybrid Architecture Combines Small Language Model Planning with Reinforcement Learning for Enhanced Decision-Making

Architecture: Dual-System Decision-Making

Evaluation on FrozenLake

Implications for Autonomous Systems

Key Components of PACT

Conclusion

Recommended Stories

Vernier Research Reveals Why Language Models Give Inconsistent Answers to Causal Questions After Variable Renaming

Yann LeCun's new AI startup AMI Labs raises $1bn to build flexible intelligence beyond LLMs

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

FreeStyle: Scalable Style-Content Dual-Reference Generation via Community LoRA Mining