Target Encoding for High-Cardinality Features

Easy
~15 min
code completion

Target Encoding for High-Cardinality Features

Target encoding replaces each categorical value with the mean of the target variable for that category. It is far more memory-efficient than one-hot encoding when a feature has hundreds or thousands of unique values (e.g., zip codes, product IDs).

Algorithm:

1. For each unique category , collect all target values where categories[i] == c

2. Replace:

Example:

categories = ["A", "B", "A", "C"]
targets    = [2,   4,   6,   8]
  • A → mean([2, 6]) = 4.0
  • B → mean([4]) = 4.0
  • C → mean([8]) = 8.0
  • Result: [4.0, 4.0, 4.0, 8.0]
  • > Leakage warning: In production, compute encoding means from training data only and apply them at inference time.

    Your task:

    Implement target_encode(categories, targets) that returns a float array of the same length.

    Example Tests

    Three categories: A and B have equal means, C differs

    Input: {"targets":[2,4,6,8],"categories":["A","B","A","C"]}

    Expected: [4,4,4,8]

    X appears twice — mean computed over both occurrences

    Input: {"targets":[10,20,5],"categories":["X","X","Y"]}

    Expected: [15,15,5]

    All unique categories: each maps to its own target value

    Input: {"targets":[1,2,3],"categories":["A","B","C"]}

    Expected: [1,2,3]

    Sign in to solve this problem

    You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.