Causal Attention Mask

Medium
~6 min
code completion

Causal Attention Mask

In autoregressive decoding, token is only allowed to attend to tokens .

If future tokens are visible during training, the model can "cheat".

A causal mask is defined as:

This is a lower-triangular matrix. In full attention implementations, entries with 0 are usually turned into before softmax.

Your task: Implement causal_mask(seq_len) to return a float NumPy array of shape (seq_len, seq_len).

Implementation requirements:

1. Include the diagonal (self-attention is allowed).

2. Return numeric values 1.0 and 0.0.

3. Use a vectorized NumPy approach.

Example Tests

Single token mask

Input: {"seq_len":1}

Expected: [[1]]

Lower-triangular mask for seq_len=3

Input: {"seq_len":3}

Expected: [[1,0,0],[1,1,0],[1,1,1]]

Shape check: seq_len=6

Input: {"seq_len":6}

Expected: [6,6]

Sign in to solve this problem

You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.