Detect Schema Drift

Easy
~12 min
code completion

Detect Schema Drift

Schema drift occurs when the structure of a dataset changes between pipeline runs — columns appear, disappear, or change type. Undetected schema drift causes silent model degradation or runtime errors in production.

You are given two schema dictionaries mapping column names to their dtype strings (e.g., "int", "float", "str", "bool"). Identify the following:

  • Added: columns in new_schema but not in old_schema
  • Removed: columns in old_schema but not in new_schema
  • Type-changed: columns in both schemas but with different dtypes
  • Return a dict with keys "added", "removed", and "type_changed", each mapping to a sorted list of column names.

    Example:

    old = {"age": "int", "name": "str"}
    new = {"age": "float", "email": "str"}
    # Result:
    # {"added": ["email"], "removed": ["name"], "type_changed": ["age"]}

    Your task:

    Implement detect_schema_drift(old_schema, new_schema).

    Example Tests

    One new column added, one type changed, none removed

    Input: {"new_schema":{"a":"int","b":"str","c":"bool"},"old_schema":{"a":"float","b":"str"}}

    Expected: {"added":["c"],"removed":[],"type_changed":["a"]}

    One column removed, no other changes

    Input: {"new_schema":{"x":"int"},"old_schema":{"x":"int","y":"float"}}

    Expected: {"added":[],"removed":["y"],"type_changed":[]}

    Identical schemas: no drift detected

    Input: {"new_schema":{"name":"str"},"old_schema":{"name":"str"}}

    Expected: {"added":[],"removed":[],"type_changed":[]}

    Sign in to solve this problem

    You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.