Discounted Return

Hard
~12 min
code completion

Discounted Return

In reinforcement learning, the return at time is the total future reward, discounted by :

The discount factor controls how much future rewards matter:

  • : only immediate reward matters
  • : all future rewards matter nearly equally
  • For the full episode starting at with rewards :

    Your task:

    Implement discounted_return(rewards, gamma) that returns the total discounted return .

    Example Tests

    gamma=1: sum of all rewards

    Input: {"gamma":1,"rewards":[1,1,1,1]}

    Expected: 4

    gamma=0: only first reward counts

    Input: {"gamma":0,"rewards":[5,10,20]}

    Expected: 5

    gamma=0.9: 1 + 0.9 + 0.81 = 2.71

    Input: {"gamma":0.9,"rewards":[1,1,1]}

    Expected: 2.71

    Sign in to solve this problem

    You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.