Attention & Transformers — ML Concept & Coding Problems | GRADuate

Back to curriculum

Attention & Transformers

Pro

Expert

From "what to look at" to the engine behind GPT and BERT. Scaled dot-product attention replaces recurrence with direct token-to-token relationships.

Learning Objectives

→Compute scaled dot-product attention
→Understand queries, keys, and values
→Implement softmax-based attention weights
→Describe multi-head attention and positional encoding

Practice

Sign in for the concept check

The optional multiple-choice concept check tracks your understanding. Browse the coding problems below, then sign in when you're ready to solve them.

Coding Problems (7)

Softmax Attention Weights

~12 min· Medium

Scaled Dot-Product Attention

~25 min· Hard

Preview →

Sinusoidal Positional Encoding

~12 min· Medium

Causal Attention Mask

~6 min· Medium

Multi-Head Attention

~18 min· Hard

Query-Key Attention Scores

~12 min· Easy

Attention Context Vector

~10 min· Easy