I was trying to work on a requirement where I had to compute the value for an entire column based on a formula, here is my below code
import pandas as pd
import numpy as np
s={'Fruits':['Apple','Orange', 'Banana', 'Mango'],
'month':['201401','201502','201603','201604'],'weight':[2,4,1,6],
'Quant':[132,178,298,300]}
p=pd.DataFrame(data=s)
n=len(p)
std_dev=((1)/(n-1))*(sum([(p['Quant'] - p['weight']) ** 2 for _ in range(n)]))
alpha=2
std_devf= p['Quant'] alpha*(std_dev)
The expected value for std_devf should be a single value. (Eg 100 or 200)
But the O/P I'm getting is this, which is based on every Fruits-
0 45198.666667
1 80914.000000
2 235522.000000
3 230796.000000
How would I be able to just get a single value, based on the formula? Is it due to my formula that I'm getting values in this manner?
CodePudding user response:
First of all, the std_dev
formula needs to be fixed. You are creating a list of 4 dataframes and summing them up. However, the link you have provided did not mention that way. According to the link, it should be like this:
n = len(p)
std_dev = (1/(n-1)*(sum([(p['Quant'][i] - p['weight'][i]) ** 2 for i in range(n)]))) ** 0.5
alpha = 2
std_devf= p['Quant'] alpha*(std_dev)
On the other hand, you are looking for the expected value of the std_devf
or the bound limit? If that's the case, the result will have decimals as in the link, but you can always round it up to two decimals.
round(std_devf,2)
Out[33]:
0 675.84
1 721.84
2 841.84
3 843.84
Name: Quant, dtype: float64
CodePudding user response:
for calculating your standard deviation you can follow my method
import math
s=0
for i in range(n):
s=sum([(p['Quant'][i] - p['weight'][i])])*(sum([(p['Quant'][i] -p['weight'[i])]))
std_dev=s/(n-1)
math.sqrt(std_dev)
alpha=2
std_devf= p['Quant'] alpha*(std_dev)
Hope it solves your query, you can find the image of my solution on below link [1]: https://i.stack.imgur.com/3HH5b.png