Home > Software engineering >  Python - standard deviation of chosen rows in each column
Python - standard deviation of chosen rows in each column

Time:10-13

I would like to calculate the standard deviation for each column in the data frame, but only for the selected rows. I would like to reflect this formula from Excel (calculates standard deviation only for highlighted cells and moves one index down each column)

fromula in excel

Could you please advise me how should I construct my loop to do that? Thank you.

CodePudding user response:

I would use numpy for that:

example input:

>>> df = pd.DataFrame(np.arange(50).reshape(10,5,order='F')).add_prefix('m')
>>> df
   m0  m1  m2  m3  m4
0   0  10  20  30  40
1   1  11  21  31  41
2   2  12  22  32  42
3   3  13  23  33  43
4   4  14  24  34  44
5   5  15  25  35  45
6   6  16  26  36  46
7   7  17  27  37  47
8   8  18  28  38  48
9   9  19  29  39  49

output (here using 5 rows):

>>> pd.Series(np.c_[[np.diagonal(df.values, offset=-i) for i in range(5)]].std(axis=0), index=df.columns)
m0    1.414214
m1    1.414214
m2    1.414214
m3    1.414214
m4    1.414214

This works by creating an intermediary array:

>>> np.c_[[np.diagonal(df.values, offset=-i) for i in range(5)]]
array([[ 0, 11, 22, 33, 44],
       [ 1, 12, 23, 34, 45],
       [ 2, 13, 24, 35, 46],
       [ 3, 14, 25, 36, 47],
       [ 4, 15, 26, 37, 48]])

imports:

import pandas as pd
import numpy as np
  • Related