Hello there,
in chapter 2.1 you define the reward as
$G_t = R_{t + 1} + \gamma R_{t + 2} + \gamma^2 R_{t + 3} + ... +\gamma^{H-1} R_H $
isn’t that the return? At least google defines it as the return.
So the reward should then just be the “points” the agent receives by transitioning from state s to s’?