Home > Blockchain >  Compute standard deviation on parts of a data frame
Compute standard deviation on parts of a data frame

Time:06-20

I have a dataframe of 2000 columns and 1 row. I want to calculate the STD for 50 columns at a time. Any ideas on how to do it?

thank you,

CodePudding user response:

If need count first 50 columns use:

out = df.iloc[0, :50].std()

For each 50 values use:

s = df.iloc[0]

out = s.groupby(np.arange(len(s)) // 50).std()

Sample:

np.random.seed(202206)
df = pd.DataFrame([np.random.randint(20, size=20)]).add_prefix('c')
print(df)
   c0  c1  c2  c3  c4  c5  c6  c7  c8  c9  c10  c11  c12  c13  c14  c15  c16  \
0   7   8  12  15   2  10   3   3   6   7    5   19    1   10   15   12   15   

   c17  c18  c19  
0    9   18   10  

out = df.iloc[0, :5].std()
print(out)
4.969909455915671

s = df.iloc[0]

out = s.groupby(np.arange(len(s)) // 5).std()
print(out)
0    4.969909
1    2.949576
2    7.280110
3    3.701351
Name: 0, dtype: float64
  • Related