Home > Net >  numpy standard deviation does not give the same result as scipy stats standard deviation
numpy standard deviation does not give the same result as scipy stats standard deviation

Time:11-01

Scipy and numpy standard deviation methods give slightly different results. I don't understand why. Can anyone explain that to me?

Here is an example.

import numpy as np
import scipy.stats
ar = np.arange(20)
print(np.std(ar))
print(scipy.stats.tstd(ar))

returns

5.766281297335398
5.916079783099616

CodePudding user response:

It's in my mind a while ago..To get the same values

import numpy as np
import scipy.stats
ar = np.arange(20)
print(np.std(ar, ddof=1))
print(scipy.stats.tstd(ar))

output #

5.916079783099616
5.916079783099616

My mentor use to say

-->ddof=1 if you're calculating np.std() for a sample taken from your complete dataset.

---> ddof=0 if you're calculating for the full population

CodePudding user response:

With np.std() you are computing the standard deviation:

x = np.abs(ar - ar.mean())**2
std = np.sqrt(np.sum(x) / len(ar)) # 5.766281297335398

However, with scipy.stats.tstd you are computing the trimmed standard deviation:

x = np.abs(ar - ar.mean())**2
std = np.sqrt(np.sum(x) / (len(ar) - 1)) # 5.916079783099616

Note that you are computing the square root of the mean of x when using np.std() (the mean of x is the sum of x divided by the length of x). When computing the trimmed version you are dividing by n-1, n being the length of the array.

  • Related