Undersample for Class Balance
Undersample for Class Balance
Class imbalance — one class having far more samples than another — biases models toward the majority class, making them predict it even when the minority class is the point of interest.
A simple remedy is undersampling: keep at most n_per_class samples from each class. For reproducibility, select the first n_per_class indices per class (in their order of appearance in y).
Algorithm:
1. Find unique classes with np.unique(y) — returns them in sorted order
2. For each class, find its sample indices and keep the first n_per_class
3. Combine all selected indices and return them sorted
Example: y = [0, 0, 0, 1, 1, 1], n_per_class = 2
[0, 1, 3, 4]Your task:
Implement undersample(y, n_per_class) that returns a sorted integer array of selected sample indices.
Example Tests
Balanced 3+3 dataset undersampled to 2 per class
Input: {"y":[0,0,0,1,1,1],"n_per_class":2}
Expected: [0,1,3,4]
3-class dataset: keep 1 from each class
Input: {"y":[0,1,0,1,2,2],"n_per_class":1}
Expected: [0,1,4]
Single class with 3 samples: keep first 2
Input: {"y":[1,1,1],"n_per_class":2}
Expected: [0,1]