Chi-Squared Distribution Shift Test
Chi-Squared Distribution Shift Test
The chi-squared statistic tests whether an observed categorical distribution differs from an expected one. In ML monitoring, you can use it to detect shifts in categorical feature distributions between reference and production data.
Given observed counts and expected counts for categories:
Degrees of freedom = .
If any expected count is zero for a bin that has observed counts, skip that bin (the formula is undefined there).
Example:
observed = [50, 50, 50] (uniform actual)
expected = [60, 40, 50] (reference)
chi2 = (50-60)^2/60 + (50-40)^2/40 + (50-50)^2/50
= 100/60 + 100/40 + 0
= 1.6667 + 2.5 + 0 = 4.1667Your task:
Implement chi2_stat(observed, expected) that returns the chi-squared statistic as a float.
Example Tests
3-category shift: chi2 = 100/60 + 100/40 + 0 = 4.1667
Input: {"expected":[60,40,50],"observed":[50,50,50]}
Expected: 4.16667
Identical distributions: chi2 = 0.0
Input: {"expected":[30,20,50],"observed":[30,20,50]}
Expected: 0
Binary case: 2-category deviation
Input: {"expected":[50,50],"observed":[40,60]}
Expected: 4