Fine-tuning Head Gradient

Medium
~20 min
code completion

Fine-tuning: Gradient for the Linear Head

When fine-tuning only the linear head of a pretrained model, we compute the gradient of MSE loss with respect to :

where:

  • : frozen embeddings, shape (m, d)
  • : head weights, shape (d, 1)
  • : ground truth, shape (m, 1)
  • This gradient is then used to update with gradient descent — the backbone is never touched.

    Your task:

    Implement finetune_gradient(Z, W, y_true) that returns of shape (d, 1).

    Example Tests

    Perfect predictions: zero gradient

    Input: {"W":[[1],[1]],"Z":[[1,0],[0,1]],"y_true":[[1],[1]]}

    Expected: [[0],[0]]

    Known gradient: overprediction pushes W down

    Input: {"W":[[2],[2]],"Z":[[1,0],[0,1]],"y_true":[[1],[1]]}

    Expected: [[1],[1]]

    Output shape is (d, 1)

    Input: {"W":[[1],[1],[1]],"Z":[[1,2,3],[4,5,6]],"y_true":[[6],[15]]}

    Expected: [3,1]

    Sign in to solve this problem

    You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.