Easy

L1 Regularization Gradient

Easy

~10 min

code completion

Gradient of L1 Regularization

To include L1 regularization in gradient descent, we need:

\frac{\partial}{\partial W _{ij}} (λ ∣ W_{ij} ∣) = λ \cdot sign (W_{ij})

where $sign (x) = ⎩ ⎨ ⎧ + 1 0 - 1 x > 0 x = 0 x < 0$

This gradient is added to the data-loss gradient during backpropagation:

\nabla_{W} L_{total} = \nabla_{W} L_{data} + λ \cdot sign (W)

Key contrast with L2: The L1 gradient has constant magnitude — it always pushes each weight toward zero by the same step size regardless of weight magnitude. This creates sparsity because small weights get pushed all the way to zero, while L2 only asymptotically approaches zero.

Your task:

Implement l1_gradient(W, lambda_) that returns the gradient of the L1 penalty with respect to each element of W.

Example Tests

Row vector with positive and negative entries, lambda=1

Input: {"W":[[1,-2,3]],"lambda_":1}

Expected: [[1,-1,1]]

Zero maps to zero; lambda=0.5 scales the result

Input: {"W":[[0,5,-5]],"lambda_":0.5}

Expected: [[0,0.5,-0.5]]

2x2 matrix with small lambda

Input: {"W":[[1,-1],[2,-2]],"lambda_":0.1}

Expected: [[0.1,-0.1],[0.1,-0.1]]

You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.