I'm trying to calculate standard deviation in python without the use of numpy
or any external library except for math
. I want to get better at writing algorithms and am just doing this as a bit of "homework" as I improve my python skills. My goal is to translate this formula into python but am not getting the correct result.
I'm using an array of speeds where speeds = [86,87,88,86,87,85,86]
When I run:
std_dev = numpy.std(speeds)
print(std_dev)
I get: 0.903507902905. But I don't want to rely on numpy. So...
My implementation is as follows:
import math
speeds = [86,87,88,86,87,85,86]
def get_mean(array):
sum = 0
for i in array:
sum = sum i
mean = sum/len(array)
return mean
def get_std_dev(array):
# get mu
mean = get_mean(array)
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2
return array
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff i
return sum_sqr_diff
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev
std_dev = get_std_dev(speeds)
print(std_dev)
Now when I run:
std_dev = get_std_dev(speeds)
print(std_dev)
I get: [0]
but I am expecting 0.903507902905
What am I missing here?
CodePudding user response:
speeds = [86,87,88,86,87,85,86]
# Calculate the mean of the values in your list
mean_speeds = sum(speeds) / len(speeds)
# Calculate the variance of the values in your list
# This is 1/N * sum((x - mean(X))^2)
var_speeds = sum((x - mean_speeds) ** 2 for x in speeds) / len(speeds)
# Take the square root of variance to get standard deviation
sd_speeds = var_speeds ** 0.5
>>> sd_speeds
0.9035079029052513
CodePudding user response:
some problems in the code, one of them is the return value inside the for statement. you can try this
def get_mean(array):
return sum(array) / len(array)
def get_std_dev(array):
n = len(array)
mean = get_mean(array)
squares_arr = []
for item in array:
squares_arr.append((item - mean) ** 2)
return math.sqrt(sum(squares_arr) / n)
CodePudding user response:
This. You need to get rid of return
inside for loops.
def get_std_dev(array):
# get mu
mean = get_mean(array)
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff (i - mean)**2
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev
CodePudding user response:
If you don't want to use numpy
its ok give a try to statistics
package in python
import statistics
st_dev = statistics.pstdev(speeds)
print(st_dev)
or if you are still willing to use a custom solution then I recommend you to use the following way using list comprehension instead of your complex buggy approach
import math
mean = sum(speeds) / len(speeds)
var = sum((l-mean)**2 for l in speeds) / len(speeds)
st_dev = math.sqrt(var)
print(st_dev)