Klink, Pascal_Metric Self-Paced Reinforcement Learning

Klink, Pascal_Metric Self-Paced Reinforcement Learning

Bildunterschrift

Figure 1: An example of the bipedal walker environment, in which we parameterize the size
(x-axis) as well as the spacing (y-axis) of obstacles. The colored dots represent tasks that
have been generated by our method. Dark colors indicate tasks that have been trained on in early
stages, and bright colors show tasks that have been generated towards the end of training. We
see that the our method increases the obstacle size as training progresses, simultaneously
adapting the maximum size based on the spacing between the obstacles.