Medium

Leakage-Free Feature Standardization

Medium

~20 min

code completion

Leakage-Free Feature Standardization

Standardization (z-score normalization) centers each feature to zero mean and unit variance:

z = \frac{x - μ}{σ}

The leakage trap: If you compute $μ$ and $σ$ over both train and test data, test information contaminates your scaler — your validation metrics will be optimistic and the model will underperform in production.

Correct approach:

1. Fit on X_train only: compute per-column $μ$ and $σ$ using np.mean(..., axis=0) and np.std(..., axis=0)

2. Transform X_test with the same $μ$ and $σ$

Edge case: If any column has $σ = 0$ (constant), set $σ = 1$ to avoid division by zero.

Your task:

Implement scale_with_train_stats(X_train, X_test) that returns standardized X_test.

Example Tests

2-feature train range [0,2]: test point [1,3] → [0.0, 2.0]

Input: {"X_test":[[1,3]],"X_train":[[0,0],[2,2]]}

Expected: [[0,2]]

Train with unit std: test point shifted and scaled correctly

Input: {"X_test":[[5,6]],"X_train":[[1,2],[3,4]]}

Expected: [[3,3]]

Single feature: two test points standardized by train stats

Input: {"X_test":[[2],[8]],"X_train":[[2],[4],[6]]}

Expected: [[-1.22474],[2.44949]]

You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.