Detect Schema Drift
Detect Schema Drift
Schema drift occurs when the structure of a dataset changes between pipeline runs — columns appear, disappear, or change type. Undetected schema drift causes silent model degradation or runtime errors in production.
You are given two schema dictionaries mapping column names to their dtype strings (e.g., "int", "float", "str", "bool"). Identify the following:
new_schema but not in old_schemaold_schema but not in new_schemaReturn a dict with keys "added", "removed", and "type_changed", each mapping to a sorted list of column names.
Example:
old = {"age": "int", "name": "str"}
new = {"age": "float", "email": "str"}
# Result:
# {"added": ["email"], "removed": ["name"], "type_changed": ["age"]}Your task:
Implement detect_schema_drift(old_schema, new_schema).
Example Tests
One new column added, one type changed, none removed
Input: {"new_schema":{"a":"int","b":"str","c":"bool"},"old_schema":{"a":"float","b":"str"}}
Expected: {"added":["c"],"removed":[],"type_changed":["a"]}
One column removed, no other changes
Input: {"new_schema":{"x":"int"},"old_schema":{"x":"int","y":"float"}}
Expected: {"added":[],"removed":["y"],"type_changed":[]}
Identical schemas: no drift detected
Input: {"new_schema":{"name":"str"},"old_schema":{"name":"str"}}
Expected: {"added":[],"removed":[],"type_changed":[]}