Remove Duplicate Rows

Easy
~15 min
code completion

Remove Duplicate Rows

Duplicate records inflate training data and cause models to overfit to specific examples. Unlike np.unique(axis=0) — which sorts the rows — a production pipeline must preserve insertion order so that downstream time-based logic is not disrupted.

Strategy:

1. Iterate through rows in order

2. Track seen rows using a set of tuples

3. Keep only the first occurrence of each unique row

Example:

Input:   [[1, 2], [3, 4], [1, 2], [5, 6]]
Output:  [[1, 2], [3, 4], [5, 6]]   ← insertion order preserved

The duplicate [1, 2] at index 2 is dropped; [3, 4] and [5, 6] remain in place.

Your task:

Implement remove_duplicate_rows(X) that returns a 2D NumPy array of unique rows in first-appearance order.

Example Tests

One duplicate at index 2 removed: output shape is (3, 2)

Input: {"X":[[1,2],[3,4],[1,2],[5,6]]}

Expected: [3,2]

First row of deduplicated output is correct

Input: {"X":[[1,2],[3,4],[1,2],[5,6]]}

Expected: [1,2]

Three occurrences of [1,1] collapsed to one: output shape is (3, 2)

Input: {"X":[[1,1],[2,2],[1,1],[1,1],[3,3]]}

Expected: [3,2]

Sign in to solve this problem

You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.