Winsorize Outlier Values
Winsorize Outlier Values
Winsorization replaces extreme values with the percentile boundary — rather than removing them. This preserves dataset size while limiting the influence of outliers on model training.
For lower percentile and upper percentile :
1. Compute and
2. Clip: any value below becomes ; any above becomes
Example: arr = [1, 2, 3, 4, 100], lower_pct = 0, upper_pct = 75
[1.0, 2.0, 3.0, 4.0, 4.0]Unlike z-score thresholding, winsorization makes no distributional assumption — it is purely rank-based.
Your task:
Implement winsorize(arr, lower_pct, upper_pct) using np.percentile and np.clip.
Example Tests
Single high outlier clipped to 75th percentile
Input: {"arr":[1,2,3,4,100],"lower_pct":0,"upper_pct":75}
Expected: [1,2,3,4,4]
Both tails clipped: low values raised, high values lowered
Input: {"arr":[0,10,20,30,40,50],"lower_pct":20,"upper_pct":80}
Expected: [10,10,20,30,40,40]
All identical values: clipping leaves array unchanged
Input: {"arr":[5,5,5,5],"lower_pct":25,"upper_pct":75}
Expected: [5,5,5,5]