Easy

Target Encoding for High-Cardinality Features

Easy

~15 min

code completion

Target Encoding for High-Cardinality Features

Target encoding replaces each categorical value with the mean of the target variable for that category. It is far more memory-efficient than one-hot encoding when a feature has hundreds or thousands of unique values (e.g., zip codes, product IDs).

Algorithm:

1. For each unique category $c$ , collect all target values where categories[i] == c

2. Replace: $encoded [i] = mean ({targets [j] : categories [j] = c})$

Example:

categories = ["A", "B", "A", "C"]
targets    = [2,   4,   6,   8]

A → mean([2, 6]) = 4.0

B → mean([4]) = 4.0

C → mean([8]) = 8.0

Result: [4.0, 4.0, 4.0, 8.0]

> Leakage warning: In production, compute encoding means from training data only and apply them at inference time.

Your task:

Implement target_encode(categories, targets) that returns a float array of the same length.

Example Tests

Three categories: A and B have equal means, C differs

Input: {"targets":[2,4,6,8],"categories":["A","B","A","C"]}

Expected: [4,4,4,8]

X appears twice — mean computed over both occurrences

Input: {"targets":[10,20,5],"categories":["X","X","Y"]}

Expected: [15,15,5]

All unique categories: each maps to its own target value

Input: {"targets":[1,2,3],"categories":["A","B","C"]}

Expected: [1,2,3]

You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.