Final vision of hierarchical multi-agent reinforcement learning framework for maintenance and repair (MRO)
Abstract
This paper considers a hierarchical multi-agent reinforcement learning approach for optimizing technical maintenance and repair under uncertainty. A model is proposed that integrates strategic planning and tactical coordination with a safe learning mechanism.
Short Description
This paper proposes a hierarchical multi-agent reinforcement learning (HRL–MARL) framework for maintenance and repair (MRO) optimization in industrial environments under partial observability, stochastic failures, and hard operational constraints. The production–maintenance system is modeled as a Dec-POMDP, where multiple agents (equipment units and maintenance resources) act on local observations augmented with predictive maintenance (PdM) signals (e.g., RUL estimates and failure probabilities). To ensure safe operation and regulatory compliance, the decision process is constrained via a CMDP formulation using action masking (shielding) and Lagrangian penalty updates.
The control architecture is split into two time scales: a strategic level (SMDP over options) selects high-level maintenance decisions (e.g., maintenance windows, prioritization, resource allocation), while a tactical level executes these options through multi-agent coordination under CTDE (centralized training, decentralized execution). A human-in-the-loop approval operator is introduced at the strategic level to validate or reject proposed options; rejections trigger a safe fallback policy and are also incorporated into the reward to discourage actions that increase operator intervention.
A simulation study on a digital twin of an industrial equipment fleet compares the proposed method against baseline maintenance policies (reactive and schedule-based). Results indicate a shift from emergency repairs toward more preventive planning under limited resources, reducing failures while respecting constraints, and improving robustness in high-uncertainty regimes.