L1 Regularization Gradient
Gradient of L1 Regularization
To include L1 regularization in gradient descent, we need:
where
This gradient is added to the data-loss gradient during backpropagation:
Key contrast with L2: The L1 gradient has constant magnitude — it always pushes each weight toward zero by the same step size regardless of weight magnitude. This creates sparsity because small weights get pushed all the way to zero, while L2 only asymptotically approaches zero.
Your task:
Implement l1_gradient(W, lambda_) that returns the gradient of the L1 penalty with respect to each element of W.
Example Tests
Row vector with positive and negative entries, lambda=1
Input: {"W":[[1,-2,3]],"lambda_":1}
Expected: [[1,-1,1]]
Zero maps to zero; lambda=0.5 scales the result
Input: {"W":[[0,5,-5]],"lambda_":0.5}
Expected: [[0,0.5,-0.5]]
2x2 matrix with small lambda
Input: {"W":[[1,-1],[2,-2]],"lambda_":0.1}
Expected: [[0.1,-0.1],[0.1,-0.1]]