Home > Enterprise >  The default numpy.std() is only applicable for samples where n > 30. Can this be modified?
The default numpy.std() is only applicable for samples where n > 30. Can this be modified?

Time:09-10

Say I needed to find the sample standard deviation of the data set sample A where sampleA = [34.6, 40.7, 37.5, 45.8, 41.4, 44.2, 44.5, 51.8, 47.5, 45.4, 36.4, 46.2, 43.0, 43.3, 42.0] # mass (g). By using the np.std() function we obtain the result 4.309. However, this is incorrect, as n = 15 for sample A. Resulting in a necessary change in the std. dev formula due to Student's t-distribution.

The correct function would look something like this:

def sam_stddev(data_set):
   sum = 0
   for n in data_set:
      sum  = (n - sam_mean(data_set)) ** 2
   S_x = ((1/(len(data_set) - 1) * sum)) ** 0.5
   return S_x

By using this function, we obtain the correct result of 4.460. Now, obviously I can just use this defined function, but I was wondering whether there is some kind of modifier that I can use, maybe something like np.std(sampleA, "n" = 15) that would allow me to do this by default. Alternatively, is there another library I should be using that has this built-in? I have looked at the numpy.std() documentation at https://numpy.org/doc/stable/reference/generated/numpy.std.html, but honestly, I'm inexperienced with how to actually read that.

CodePudding user response:

By default numpy divides by 1/N. If you want as 1/(N - 1) then you have to set the ddof param to 1

np_arr.std(ddof=1)
4.460119579861807 
  • Related