Medium

Fine-tuning Head Gradient

Medium

~20 min

code completion

Fine-tuning: Gradient for the Linear Head

When fine-tuning only the linear head $W$ of a pretrained model, we compute the gradient of MSE loss with respect to $W$ :

L = \frac{1}{m} ∥ Z W - y ∥^{2}

\frac{\partial L}{\partial W} = \frac{2}{m} Z^{⊤} (Z W - y)

where:

Z

: frozen embeddings, shape (m, d)

W

: head weights, shape (d, 1)

y

: ground truth, shape (m, 1)

This gradient is then used to update $W$ with gradient descent — the backbone $Z$ is never touched.

Your task:

Implement finetune_gradient(Z, W, y_true) that returns $\frac{\partial L}{\partial W}$ of shape (d, 1).

Example Tests

Perfect predictions: zero gradient

Input: {"W":[[1],[1]],"Z":[[1,0],[0,1]],"y_true":[[1],[1]]}

Expected: [[0],[0]]

Known gradient: overprediction pushes W down

Input: {"W":[[2],[2]],"Z":[[1,0],[0,1]],"y_true":[[1],[1]]}

Expected: [[1],[1]]

Output shape is (d, 1)

Input: {"W":[[1],[1],[1]],"Z":[[1,2,3],[4,5,6]],"y_true":[[6],[15]]}

Expected: [3,1]

You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.