Merge Shard Statistics

Medium
~20 min
code completion

Merge Shard Statistics

In distributed processing, each worker independently computes statistics over its data shard. To get global statistics, you must merge these partial results without re-reading the raw data.

Given two shards with counts , means , and population variances , the merged statistics are:

where and .

Example:

  • Shard A: n=3, mean=2.0, var=0.6667 (values [1,2,3])
  • Shard B: n=3, mean=5.0, var=0.6667 (values [4,5,6])
  • Merged: n=6, mean=3.5, var=2.9167 (values [1,2,3,4,5,6])
  • Your task:

    Implement merge_stats(nA, meanA, varA, nB, meanB, varB) returning (n, mean, variance) as a tuple.

    Example Tests

    Two equal shards [1,2,3] and [4,5,6]: merged mean=3.5

    Input: {"nA":3,"nB":3,"varA":0.6667,"varB":0.6667,"meanA":2,"meanB":5}

    Expected: [6,3.5,2.9167]

    Merging with a single-element shard of known value

    Input: {"nA":2,"nB":1,"varA":1,"varB":0,"meanA":3,"meanB":6}

    Expected: [3,4,2.6667]

    Identical shards: merged mean equals shard mean

    Input: {"nA":4,"nB":4,"varA":2,"varB":2,"meanA":5,"meanB":5}

    Expected: [8,5,2]

    Sign in to solve this problem

    You can read the full problem statement above. Create a free account to run code in the browser, submit solutions, and track your progress.