I can very easily get the standard deviation of some numbers in a 1D list in numpy like below:
import numpy as np
arr1 = np.array([100, 100, 100, 200, 200, 500])
sd = np.std(arr1)
print(sd)
But my data is in the form of a 2D list, in which the second value of each inner list, is the frequency:
arr2 = np.array([[100, 3], [200, 2], [500, 1]])
How can I flatten it based on frequency (change arr2
into arr1
), to get the correct standard deviation?
CodePudding user response:
Use arr2[:, 0].repeat(arr2[:, 1])
.
CodePudding user response:
If you want the whole array flattened, you can use ravel()
arr2.ravel()
# output: array([100, 3, 200, 2, 500, 1])
If you want a specific column, you can select all rows and use the index of the column
arr2[:,1]
# output: array([3, 2, 1])
arr2[:,0]
# output: array([100, 200, 500])
To get the standard deviation, you can add .std()
at the end
sd = arr2.ravel().std()
# or
sd = arr2[:,0].std()
# or
sd = arr2[:,1].std()
# etc
CodePudding user response:
While @timgeb's (good) answer is the most straightforward, this might not be efficient if you have very large inputs such as np.array([[100, 3000], [200, 20000], [500, 100]])
In this case you can compute the standard deviation manually
v,r = arr2.T
n = r.sum()
avg = (v*r).sum()/n
std = np.sqrt((r*(v-avg)**2).sum()/n)
output: 141.4213562373095
Or use statsmodels
:
from statsmodels.stats.weightstats import DescrStatsW
v,r = arr2.T
DescrStatsW(v, weights=r, ddof=0).std
# 141.4213562373095