Caption
Figure 2: Illustrating the training curve, i.e., evolution of reward over time for different graph architectures. As can be seen in the plot, our chosen multi-head attention (MHA) graph architecture outperforms the single-head attention (SHA) one, as well as the S2V architecture.