Undersample for Class Balance

Easy
~15 min
code completion

Undersample for Class Balance

Class imbalance — one class having far more samples than another — biases models toward the majority class, making them predict it even when the minority class is the point of interest.

A simple remedy is undersampling: keep at most n_per_class samples from each class. For reproducibility, select the first n_per_class indices per class (in their order of appearance in y).

Algorithm:

1. Find unique classes with np.unique(y) — returns them in sorted order

2. For each class, find its sample indices and keep the first n_per_class

3. Combine all selected indices and return them sorted

Example: y = [0, 0, 0, 1, 1, 1], n_per_class = 2

  • Class 0 indices: [0, 1, 2] → keep [0, 1]
  • Class 1 indices: [3, 4, 5] → keep [3, 4]
  • Result: [0, 1, 3, 4]
  • Your task:

    Implement undersample(y, n_per_class) that returns a sorted integer array of selected sample indices.

    Example Tests

    Balanced 3+3 dataset undersampled to 2 per class

    Input: {"y":[0,0,0,1,1,1],"n_per_class":2}

    Expected: [0,1,3,4]

    3-class dataset: keep 1 from each class

    Input: {"y":[0,1,0,1,2,2],"n_per_class":1}

    Expected: [0,1,4]

    Single class with 3 samples: keep first 2

    Input: {"y":[1,1,1],"n_per_class":2}

    Expected: [0,1]

    Sign in to solve this problem

    You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.