Home > OS >  Python Pandas groupby and mean/stdev all columns into one column
Python Pandas groupby and mean/stdev all columns into one column

Time:05-31

this was closed with the wrong answer, please provide the answers

group_ = pd.DataFrame({'Name' : ['X','Y', 'X', 'X', 'Y','Y'],
     'a' : [3,6,4,7,9,3],
     'b' : [6,4,8,7,6,5],
     'h' : [34,43,63,34,54,56]})

  Name  a  b   h
0    X  3  6  34
1    Y  6  4  43
2    X  4  8  63
3    X  7  7  34
4    Y  9  6  54
5    Y  3  5  56

I would like to get the mean and std values of X and Y for all columns ( without listing them in agg, or anywhere has to be dynamic).

for mean I could have done this :

group_.groupby('Name').mean().mean(axis=1)

Name
X    18.444444
Y    20.666667
dtype: float64

np.mean([3,6,34,4,8,63,7,7,34]) = 18.444444

but for std not, doesn't work, because it gives the std of std of 3 columns.

group_.groupby('Name').std().std(axis=1)

Name
X    8.793713
Y    3.055050
dtype: float64

NOT CORRECT! X_std = np.std([3,6,34,4,8,63,7,7,34]) = 19.556818141059377

CodePudding user response:

For std by all columns per groups first use DataFrame.melt and then GroupBy.std with ddof=0 (default pandas ddof=1):

a = group_.melt('Name').groupby('Name')['value'].std(ddof=0)

print (a)
Name
X    19.556818
Y    21.756225
Name: value, dtype: float64

X_std = np.std([3,6,34,4,8,63,7,7,34])
print (X_std)
19.556818141059377

For both aggregate functions:

df = group_.melt('Name').groupby('Name').agg(std=('value', lambda x: x.std(ddof=0)),
                                             mean=('value','mean'))

print (df)
            std       mean
Name                      
X     19.556818  18.444444
Y     21.756225  20.666667
  • Related