Easy

Query-Key Attention Scores

Easy

~12 min

code completion

Query-Key Attention Scores

The first step of scaled dot-product attention computes raw scores measuring query-key compatibility:

S = \frac{Q K ^{⊤}}{d _{k}}

Q

has shape

(T_{q}, d_{k})

— query matrix

K

has shape

(T_{k}, d_{k})

— key matrix

S

has shape

(T_{q}, T_{k})

— entry

S_{ij}

is how much query

i

attends to key

j

The $\frac{1}{d _{k}}$ scaling keeps the dot products from growing large in high dimensions, which would push softmax into a saturated regime where gradients vanish.

$d_{k}$ should be inferred from the last dimension of $K$ rather than hardcoded.

Your task:

Implement attention_scores(Q, K) that returns the scaled score matrix.

Example Tests

2 queries, 2 keys, identity-like: scores on diagonal

Input: {"K":[[1,0],[0,1]],"Q":[[1,0],[0,1]]}

Expected: [[0.70711,0],[0,0.70711]]

1 query attending to 3 keys

Input: {"K":[[1,0],[0,1],[1,1]],"Q":[[1,1]]}

Expected: [[0.70711,0.70711,1.41421]]

d_k=3 scaling applied correctly

Input: {"K":[[1,0,0]],"Q":[[2,0,1]]}

Expected: [[1.1547]]

You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.