Bootstrap Sampling
Bootstrap Sampling
Bagging (Bootstrap Aggregating) trains each model on a bootstrap sample: a random sample of the training data with replacement of the same size as the original dataset.
On average, each bootstrap sample contains ~63.2% unique examples from the original data; the remaining ~36.8% are duplicates. This diversity is what makes models in the ensemble disagree — reducing variance.
In NumPy:
indices = np.random.choice(n, size=n, replace=True)
Your task:
Implement bootstrap_sample(X, y, seed) that returns (X_sample, y_sample) — bootstrapped versions of the data using the given random seed.
Example Tests
Bootstrapped y_sample for seed 42
Input: {"X":[[1,2],[3,4],[5,6],[7,8]],"y":[0,1,0,1],"seed":42}
Expected: [0,1,0,1]
Bootstrapped y_sample for seed 0
Input: {"X":[[1],[2],[3],[4],[5]],"y":[1,2,3,4,5],"seed":0}
Expected: [5,4,3,2,2]
Seed=0 produces specific first label
Input: {"X":[[1],[2],[3],[4],[5]],"y":[10,20,30,40,50],"seed":0}
Expected: 50