March 3, 2026Open Access

Multiagentbaserad förstärkningsinlärning för kontinuerlig dynamisk sensortäckning

Key Points

The deep reinforcement learning implementation effectively maintains continuous sensor coverage over extended periods.
Simulations reveal that the MADDPG algorithm scores the highest overall evaluation reward across varying performance metrics.
This approach employs actor-critic networks, specifically linear and fully connected multi-layer perceptron networks, to optimize performance.
Results highlight the potential of multi-agent learning solutions in enhancing dynamic sensor coverage strategies.

Abstract

Surveillance and reconnaissance with unmanned aerial vehicles, often referred to as drones in everyday language, is becoming increasingly popular in both civilian and military domains. The aim of continuous dynamic sensor coverage is to visit and cover all parts of an area of interest repeatedly in a continuous fashion over a long period of time. The ultimate goal is visiting all parts of an area as often and evenly as possible during a given timeframe. The majority of previous research within this problem has been conducted utilizing rule based control methods, often combined with effective waypoint generation within the area of interest. The main goal of this thesis was to expand the research on continuous dynamic sensor coverage by implementing and later comparing a deep reinforcement learning implementation with various rule based methods. In this thesis a solution to the multi-agent continuous dynamic sensor coverage problem is developed based on the multi-agent deep deterministic policy gradient (MADDPG) algorithm, which is currently a state of the art algorithm. The actor-critic networks of the MADDPG algorithm consisted of linear and fully connected multi-layer perceptron networks. Experimental results from simulations show that the reinforcement learning solution based on MADDPG can effectively continuously cover an area during long periods of time. Likewise the deep reinforcement implementation performed better compared to the other rule based control methods over several performance metrics. This included scoring the highest overall evaluation reward and having the highest average speeds for the agents.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Ludvig Skare (Wed,) studied this question.

synapsesocial.com/papers/69a75d52c6e9836116a27241

Bookmark

View Full Paper