Attention & Transformers
Medium
Softmax Attention Weights
Medium
~12 min
code completion
Attention Weights via Softmax
In the attention mechanism, raw scores measure how relevant each key is to a query. Scores are converted to weights via softmax so they sum to 1 and can be interpreted as probabilities:
For numerical stability, subtract the maximum score before exponentiating (same trick as in softmax generally).
This is the first step of the attention formula:
Your task:
Implement softmax_attention(scores) that converts a 1D scores vector to attention weights.
Example Tests
Softmax weights sum to 1.0
Input: {"scores":[1,2,3]}
Expected: 1
Equal scores: uniform weights
Input: {"scores":[0,0,0,0]}
Expected: [0.25,0.25,0.25,0.25]
Dominant score takes most weight
Input: {"scores":[0,10,0]}
Expected: [0.0000454,0.9999092,0.0000454]