Joao_Carvalho_Measure-Valued Derivatives for Reinforcement Learning_Figure1
Joao_Carvalho_Measure-Valued Derivatives for Reinforcement Learning_Figure1
Caption
Figure 1: Policy evaluation results during training on different tasks in deep RL. Depicted are the average reward per samples collected and the 95% confidence interval of 25 random seeds.