Discounted Return
Hard
~12 min
code completion
Discounted Return
In reinforcement learning, the return at time is the total future reward, discounted by :
The discount factor controls how much future rewards matter:
For the full episode starting at with rewards :
Your task:
Implement discounted_return(rewards, gamma) that returns the total discounted return .
Example Tests
gamma=1: sum of all rewards
Input: {"gamma":1,"rewards":[1,1,1,1]}
Expected: 4
gamma=0: only first reward counts
Input: {"gamma":0,"rewards":[5,10,20]}
Expected: 5
gamma=0.9: 1 + 0.9 + 0.81 = 2.71
Input: {"gamma":0.9,"rewards":[1,1,1]}
Expected: 2.71