Hard

Discounted Return

Hard

~12 min

code completion

In reinforcement learning, the return $G_{t}$ at time $t$ is the total future reward, discounted by $γ \in [0, 1)$ :

G_{t} = k = 0 \sum T - t - 1 γ^{k} r_{t + k}

The discount factor $γ$ controls how much future rewards matter:

γ = 0

: only immediate reward matters

γ \approx 1

: all future rewards matter nearly equally

For the full episode starting at $t = 0$ with rewards $[r_{0}, r_{1}, \dots, r_{T - 1}]$ :

G_{0} = r_{0} + γ r_{1} + γ^{2} r_{2} + \dots

Your task:

Implement discounted_return(rewards, gamma) that returns the total discounted return $G_{0}$ .

Example Tests

gamma=1: sum of all rewards

Input: {"gamma":1,"rewards":[1,1,1,1]}

Expected: 4

gamma=0: only first reward counts

Input: {"gamma":0,"rewards":[5,10,20]}

Expected: 5

gamma=0.9: 1 + 0.9 + 0.81 = 2.71

Input: {"gamma":0.9,"rewards":[1,1,1]}

Expected: 2.71

You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.