Home > front end >  Formula to recalculate population variance after removing a value
Formula to recalculate population variance after removing a value

Time:01-13

Let's say I have a data set of {10, 20, 30}. My mean and variance here are mean = 20 and variance = 66.667. Is there a formula that lets me calculate the new variance value if I was to remove 10 from the data set turning it into {20, 30}?

This is a similar question to https://math.stackexchange.com/questions/3112650/formula-to-recalculate-variance-after-removing-a-value-and-adding-another-one-gi which deals with the case when there is replacement. https://math.stackexchange.com/questions/775391/can-i-calculate-the-new-standard-deviation-when-adding-a-value-without-knowing-t is also a similar question except that deals with adding adding a value instead of removing one. Removing a prior sample while using Welford's method for computing single pass variance deals with removing a sample, but I cannot figure out how to modify it for dealing with population.

CodePudding user response:

To compute Mean and Variance we want 3 parameters:

N   - number of items 
Sx  - sum of items
Sxx - sum of items squared

Having all these values we can find mean and variance as

Mean     = Sx / N
Variance = Sxx / N - Sx * Sx / N / N

In your case

items    = {10, 20, 30}

N        = 3
Sx       = 60   = 10   20   30
Sxx      = 1400 = 100   400   900 = 10 * 10   20 * 20   30 * 30  

Mean     = 60 / 3 = 20
Variance = 1400 / 3 - 60 * 60 / 3 / 3 = 66.666667  

If you want to remove an item, just update N, Sx, Sxx values and compute a new variance:

item      = 10

N'        = N - 1             = 3 - 1 = 2
Sx'       = Sx - item         = 60 - 10 = 50
Sxx'      = Sxx - item * item = 1400 - 10 * 10 = 1300

Mean'     = Sx' / N' = 50 / 2 = 25
Variance' = Sxx' / N' - Sx' * Sx' / N' / N' = 1300 / 2 - 50 * 50 / 2 / 2 = 25

So if you remove item = 10 the new mean and variance will be

Mean'     = 25
Variance' = 25
  •  Tags:  
  • Related