Merge Shard Statistics
Merge Shard Statistics
In distributed processing, each worker independently computes statistics over its data shard. To get global statistics, you must merge these partial results without re-reading the raw data.
Given two shards with counts , means , and population variances , the merged statistics are:
where and .
Example:
n=3, mean=2.0, var=0.6667 (values [1,2,3])n=3, mean=5.0, var=0.6667 (values [4,5,6])n=6, mean=3.5, var=2.9167 (values [1,2,3,4,5,6])Your task:
Implement merge_stats(nA, meanA, varA, nB, meanB, varB) returning (n, mean, variance) as a tuple.
Example Tests
Two equal shards [1,2,3] and [4,5,6]: merged mean=3.5
Input: {"nA":3,"nB":3,"varA":0.6667,"varB":0.6667,"meanA":2,"meanB":5}
Expected: [6,3.5,2.9167]
Merging with a single-element shard of known value
Input: {"nA":2,"nB":1,"varA":1,"varB":0,"meanA":3,"meanB":6}
Expected: [3,4,2.6667]
Identical shards: merged mean equals shard mean
Input: {"nA":4,"nB":4,"varA":2,"varB":2,"meanA":5,"meanB":5}
Expected: [8,5,2]