Home > other >  How do I calculate standard deviation in python without using numpy?
How do I calculate standard deviation in python without using numpy?

Time:11-24

I'm trying to calculate standard deviation in python without the use of numpy or any external library except for math. I want to get better at writing algorithms and am just doing this as a bit of "homework" as I improve my python skills. My goal is to translate this formula into python but am not getting the correct result.

I'm using an array of speeds where speeds = [86,87,88,86,87,85,86]

When I run:

std_dev = numpy.std(speeds)
print(std_dev)

I get: 0.903507902905. But I don't want to rely on numpy. So...

My implementation is as follows:

import math

speeds = [86,87,88,86,87,85,86]

def get_mean(array):
    sum = 0
    for i in array:
        sum = sum   i
    mean = sum/len(array)
    return mean

def get_std_dev(array):
    # get mu
    mean = get_mean(array)
    # (x[i] - mu)**2
    for i in array:
        array = (i - mean) ** 2
        return array
    sum_sqr_diff = 0
    # get sigma
    for i in array:
        sum_sqr_diff = sum_sqr_diff   i
        return sum_sqr_diff
    # get mean of squared differences
    variance = 1/len(array)
    mean_sqr_diff = (variance * sum_sqr_diff)
    
    std_dev = math.sqrt(mean_sqr_diff)
    return std_dev

std_dev = get_std_dev(speeds)
print(std_dev)

Now when I run:

std_dev = get_std_dev(speeds)
print(std_dev)

I get: [0] but I am expecting 0.903507902905

What am I missing here?

CodePudding user response:

speeds = [86,87,88,86,87,85,86]

# Calculate the mean of the values in your list
mean_speeds = sum(speeds) / len(speeds)

# Calculate the variance of the values in your list
# This is 1/N * sum((x - mean(X))^2)
var_speeds = sum((x - mean_speeds) ** 2 for x in speeds) / len(speeds)

# Take the square root of variance to get standard deviation
sd_speeds = var_speeds ** 0.5

>>> sd_speeds
0.9035079029052513

CodePudding user response:

some problems in the code, one of them is the return value inside the for statement. you can try this

def get_mean(array):
    return sum(array) / len(array)


def get_std_dev(array):
    n = len(array)
    mean = get_mean(array)
    squares_arr = []
    for item in array:
        squares_arr.append((item - mean) ** 2)
    return math.sqrt(sum(squares_arr) / n)

CodePudding user response:

This. You need to get rid of return inside for loops.

def get_std_dev(array):
    # get mu
    mean = get_mean(array)
    sum_sqr_diff = 0
    # get sigma
    for i in array:
        sum_sqr_diff = sum_sqr_diff   (i - mean)**2
    # get mean of squared differences
    variance = 1/len(array)
    mean_sqr_diff = (variance * sum_sqr_diff)
    
    std_dev = math.sqrt(mean_sqr_diff)
    return std_dev

CodePudding user response:

If you don't want to use numpy its ok give a try to statistics package in python

import statistics

st_dev = statistics.pstdev(speeds)
print(st_dev)

or if you are still willing to use a custom solution then I recommend you to use the following way using list comprehension instead of your complex buggy approach

import math

mean = sum(speeds) / len(speeds)
var = sum((l-mean)**2 for l in speeds) / len(speeds)
st_dev = math.sqrt(var)
print(st_dev)
  • Related