I am creating a basic gridworld RL problem and I need to calculate the return for some given episode. I currently have the array of rewards, and I would like to element-wise multiply this with a list of the form:
[gamma**0, gamma**1, gamma**2, ....]
In order to get:
[r_0*gamma**0, r_1*gamma**1, r_2*gamma**2, ....]
and then use np.sum() to get the entire return.
How can I complete that first step? I tried using Logspace, but it isn't quite what I want (or I'm doing it wrong).
CodePudding user response:
if the example if like this for reward array and gamma is some value:
n = 20
reward = np.random.randint(0, 10, n)
gamma = 2
np.sum(reward * (gamma ** np.arange(n)))