Leakage-Free Feature Standardization

Medium
~20 min
code completion

Leakage-Free Feature Standardization

Standardization (z-score normalization) centers each feature to zero mean and unit variance:

The leakage trap: If you compute and over both train and test data, test information contaminates your scaler — your validation metrics will be optimistic and the model will underperform in production.

Correct approach:

1. Fit on X_train only: compute per-column and using np.mean(..., axis=0) and np.std(..., axis=0)

2. Transform X_test with the same and

Edge case: If any column has (constant), set to avoid division by zero.

Your task:

Implement scale_with_train_stats(X_train, X_test) that returns standardized X_test.

Example Tests

2-feature train range [0,2]: test point [1,3] → [0.0, 2.0]

Input: {"X_test":[[1,3]],"X_train":[[0,0],[2,2]]}

Expected: [[0,2]]

Train with unit std: test point shifted and scaled correctly

Input: {"X_test":[[5,6]],"X_train":[[1,2],[3,4]]}

Expected: [[3,3]]

Single feature: two test points standardized by train stats

Input: {"X_test":[[2],[8]],"X_train":[[2],[4],[6]]}

Expected: [[-1.22474],[2.44949]]

Sign in to solve this problem

You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.