Recent advancements in autonomous systems and reinforcement learning (RL) have paved the way for the deployment of intelligent multi-agent systems (MASs) across a wide range of real-world applications, including urban mobility management, post-disaster response, and environmental monitoring. These systems promise significant benefits through distributed decision-making, flexible coordination, and scalable operation. However, as the environments in which these systems operate become more dynamic, data-driven, and uncertain, ensuring effective coordination among agents and enhancing the adaptability of the approaches has emerged as a central challenge. RL, as a model-free paradigm for sequential decision-making, offers a powerful framework for learning from interaction. Yet, traditional RL methods often fall short when applied to real-world MAS scenarios due to sparse rewards, limited communication bandwidth, and the need for generalization across tasks and environments. To address these challenges, this research proposes a structured investigation into the development of adaptive and coordinated MASs via RL under constrained and uncertain conditions. First, a Bi-Layer Joint Training Reinforcement Learning (BJoT-RL) framework is developed for post-disaster rescue, enabling joint task allocation and pathfinding via pre-trained policies that minimize communication overhead and allow for modular deployment. Second, a Multi-Behavior Multi-Agent Reinforcement Learning (MBMARL) framework is introduced, leveraging offline RL and policy diversity to improve adaptability and robustness in informed search tasks with complex spatial uncertainty and sparse reward signals. Third, the research advances to a Multi-Personality Multi-Agent Meta-Reinforcement Learning (MPMA-MRL) framework, which enhances generalization and interpretability by meta-training a set of distinct personality policies and equipping agents with a context-aware personality selector for coordinated action. Fourth, building upon BJoT-RL, the RL-based Task-Allocation and Path-Finding under Uncertainty (RL-TAPU) framework is introduced to improve the adaptability and real-world readiness by combining task-aware meta-RL planning with large-scale rescue deployment capabilities. Finally, the fifth contribution, Meta-Reinforcement Learning with Explicit Task Inference (Meta-ETI), proposes a novel latent environment-task inference mechanism that improves the accuracy and speed of meta-policy adaptation. By explicit task inference with the feature extractor, this method enables more efficient and targeted generalization across diverse tasks. The overarching goal of this dissertation is to explore how different RL paradigms—ranging from joint pre-training to offline RL and meta-learning—can be systematically combined to meet the demands of real-world multi-agent system applications. The proposed methods are validated in representative domains including post-disaster response, underwater search, intelligent transportation systems, and unmanned aerial vehicle (UAV) coverage for IoT devices. This dissertation offers a unified RL perspective on scalable coordination and fast adaptation in MASs, contributing both practical algorithms and theoretical insights into multi-agent learning under realistic operational constraints.
Songjun Huang (Thu,) studied this question.