Query-Key Attention Scores

Easy
~12 min
code completion

Query-Key Attention Scores

The first step of scaled dot-product attention computes raw scores measuring query-key compatibility:

  • has shape — query matrix
  • has shape — key matrix
  • has shape — entry is how much query attends to key
  • The scaling keeps the dot products from growing large in high dimensions, which would push softmax into a saturated regime where gradients vanish.

    should be inferred from the last dimension of rather than hardcoded.

    Your task:

    Implement attention_scores(Q, K) that returns the scaled score matrix.

    Example Tests

    2 queries, 2 keys, identity-like: scores on diagonal

    Input: {"K":[[1,0],[0,1]],"Q":[[1,0],[0,1]]}

    Expected: [[0.70711,0],[0,0.70711]]

    1 query attending to 3 keys

    Input: {"K":[[1,0],[0,1],[1,1]],"Q":[[1,1]]}

    Expected: [[0.70711,0.70711,1.41421]]

    d_k=3 scaling applied correctly

    Input: {"K":[[1,0,0]],"Q":[[2,0,1]]}

    Expected: [[1.1547]]

    Sign in to solve this problem

    You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.