Fine-tuning Head Gradient
Medium
~20 min
code completion
Fine-tuning: Gradient for the Linear Head
When fine-tuning only the linear head of a pretrained model, we compute the gradient of MSE loss with respect to :
where:
This gradient is then used to update with gradient descent — the backbone is never touched.
Your task:
Implement finetune_gradient(Z, W, y_true) that returns of shape (d, 1).
Example Tests
Perfect predictions: zero gradient
Input: {"W":[[1],[1]],"Z":[[1,0],[0,1]],"y_true":[[1],[1]]}
Expected: [[0],[0]]
Known gradient: overprediction pushes W down
Input: {"W":[[2],[2]],"Z":[[1,0],[0,1]],"y_true":[[1],[1]]}
Expected: [[1],[1]]
Output shape is (d, 1)
Input: {"W":[[1],[1],[1]],"Z":[[1,2,3],[4,5,6]],"y_true":[[6],[15]]}
Expected: [3,1]