Caption
Figure 1: A hierarchical overview of our approach. The learned high-level mean-field control policy sends movement instructions to the (UAV) swarm, while each agent uses a real-time collision avoidance algorithm – here artificial potential fields (APF) – to avoid collisions with others.