Query-Key Attention Scores
Easy
~12 min
code completion
Query-Key Attention Scores
The first step of scaled dot-product attention computes raw scores measuring query-key compatibility:
The scaling keeps the dot products from growing large in high dimensions, which would push softmax into a saturated regime where gradients vanish.
should be inferred from the last dimension of rather than hardcoded.
Your task:
Implement attention_scores(Q, K) that returns the scaled score matrix.
Example Tests
2 queries, 2 keys, identity-like: scores on diagonal
Input: {"K":[[1,0],[0,1]],"Q":[[1,0],[0,1]]}
Expected: [[0.70711,0],[0,0.70711]]
1 query attending to 3 keys
Input: {"K":[[1,0],[0,1],[1,1]],"Q":[[1,1]]}
Expected: [[0.70711,0.70711,1.41421]]
d_k=3 scaling applied correctly
Input: {"K":[[1,0,0]],"Q":[[2,0,1]]}
Expected: [[1.1547]]