Surveillance and reconnaissance with unmanned aerial vehicles, often referred to as drones in everyday language, is becoming increasingly popular in both civilian and military domains. The aim of continuous dynamic sensor coverage is to visit and cover all parts of an area of interest repeatedly in a continuous fashion over a long period of time. The ultimate goal is visiting all parts of an area as often and evenly as possible during a given timeframe. The majority of previous research within this problem has been conducted utilizing rule based control methods, often combined with effective waypoint generation within the area of interest. The main goal of this thesis was to expand the research on continuous dynamic sensor coverage by implementing and later comparing a deep reinforcement learning implementation with various rule based methods. In this thesis a solution to the multi-agent continuous dynamic sensor coverage problem is developed based on the multi-agent deep deterministic policy gradient (MADDPG) algorithm, which is currently a state of the art algorithm. The actor-critic networks of the MADDPG algorithm consisted of linear and fully connected multi-layer perceptron networks. Experimental results from simulations show that the reinforcement learning solution based on MADDPG can effectively continuously cover an area during long periods of time. Likewise the deep reinforcement implementation performed better compared to the other rule based control methods over several performance metrics. This included scoring the highest overall evaluation reward and having the highest average speeds for the agents.
Ludvig Skare (Wed,) studied this question.