Joao_Carvalho_Measure-Valued Derivatives for Reinforcement Learning_Figure1

Joao_Carvalho_Measure-Valued Derivatives for Reinforcement Learning_Figure1

Caption

Figure 1: Policy evaluation results during training on different tasks in deep RL. Depicted are the average reward per samples collected and the 95% confidence interval of 25 random seeds.