Home > Enterprise >  How to Group by the mean of specific columns in Python
How to Group by the mean of specific columns in Python

Time:12-25

In the dataframe below:

import pandas as pd
import numpy as np
df= {
    'Gen':['M','M','M','M','F','F','F','F','M','M','M','M','F','F','F','F'],
    'Site':['FRX','FX','FRX','FRX','FRX','FX','FRX','FX','FX','FX','FX','FRX','FRX','FRX','FRX','FRX'],
    'Type':['L','L','L','L','L','L','L','L','R','R','R','R','R','R','R','R'],
     'AIC':['<1','<1','<1','<1',1,1,1,1,2,2,2,2,'>2','>2','>2','>2'],
     'AIC_TRX':[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4],
     'diff':[-1,-1,-1,-1,0,0,0,0,1,1,1,1,3,3,3,3],
     'series':[1,2,4,8,1,2,4,8,1,2,4,8,1,2,4,8],
    'Grwth_Time1':[150.78,162.34,188.53,197.69,208.07,217.76,229.48,139.51,146.87,182.54,189.57,199.97,229.28,244.73,269.91,249.19],
     'Grwth_Time2':[250.78,262.34,288.53,297.69,308.07,317.7,329.81,339.15,346.87,382.54,369.59,399.97,329.28,347.73,369.91,349.12],
     'Grwth_Time3':[240.18,232.14,258.53,276.69,338.07,307.74,359.16,339.25,365.87,392.48,399.97,410.75,429.08,448.39,465.15,469.33],
     'Grwth_Time4':[270.84,282.14,298.53,306.69,318.73,327.47,369.63,389.59,398.75,432.18,449.78,473.55,494.85,509.39,515.52,539.23],
      'Grwth_Time5':[25.78,22.34,28.53,27.69,30.07,17.7,29.81,33.15,34.87,32.54,36.59,39.97,29.28,34.73,36.91,34.12],
     'Grwth_Time6':[240.18,232.14,258.53,276.69,338.07,307.74,359.16,339.25,365.87,392.48,399.97,410.75,429.08,448.39,465.15,469.33],
     'Grwth_Time7':[27.84,28.14,29.53,30.69,18.73,27.47,36.63,38.59,38.75,24.18,24.78,21.55,13.85,9.39,15.52,39.23],
}
df = pd.DataFrame(df,columns = ['Gen','Site','Type','AIC','AIC_TRX','diff','series','Grwth_Time1','Grwth_Time2','Grwth_Time3','Grwth_Time4','Grwth_Time5','Grwth_Time6','Grwth_Time7'])
df.info()

I want to do the following:

  1. Find the average of each unique series per AIC_TRX for each Grwth_Time (Grwth_Time1, Grwth_Time2,....,Grwth_Time7)

  2. Export all the outputs as one xlsx file (refer to the figure below)

The desired outputs look like the figure below (note: the numbers in this output are not the actual average values, they were randomly generated)

enter image description here

My attempt:

# Select the columns -> AIC_TRX, series, Grwth_Time1,Grwth_Time2,....,Grwth_Time7 
df1 = df[['AIC_TRX', 'diff', 'series',
       'Grwth_Time1', 'Grwth_Time2', 'Grwth_Time3', 'Grwth_Time4',
       'Grwth_Time5', 'Grwth_Time6', 'Grwth_Time7']]


#Below is where I need help, I want to groupby the 'series' and 'AIC_TRX' for all the 'Grwth_Time1_to_7' 

df1.groupby('series').Grwth_Time1.agg(['mean']) 

Thanks in advance

CodePudding user response:

You have to groupby two columns: ['series', 'AIC_TRX'] and find mean of each Grwth_Time.

df.groupby(['series', 'AIC_TRX'])[['Grwth_Time1', 'Grwth_Time2', 'Grwth_Time3', 
        'Grwth_Time4', 'Grwth_Time5', 'Grwth_Time6', 'Grwth_Time7']].mean().unstack().to_excel("output.xlsx")

Output:

AIC_TRX       1       2       3       4
series                                 
1        150.78  208.07  146.87  229.28
2        162.34  217.76  182.54  244.73
4        188.53  229.48  189.57  269.91
8        197.69  139.51  199.97  249.19


AIC_TRX       1       2       3       4
series                                 
1        250.78  308.07  346.87  329.28
2        262.34  317.70  382.54  347.73
4        288.53  329.81  369.59  369.91
8        297.69  339.15  399.97  349.12


AIC_TRX       1       2       3       4
series                                 
1        240.18  338.07  365.87  429.08
2        232.14  307.74  392.48  448.39
4        258.53  359.16  399.97  465.15
8        276.69  339.25  410.75  469.33


AIC_TRX       1       2       3       4
series                                 
1        270.84  318.73  398.75  494.85
2        282.14  327.47  432.18  509.39
4        298.53  369.63  449.78  515.52
8        306.69  389.59  473.55  539.23


AIC_TRX      1      2      3      4
series                             
1        25.78  30.07  34.87  29.28
2        22.34  17.70  32.54  34.73
4        28.53  29.81  36.59  36.91
8        27.69  33.15  39.97  34.12


AIC_TRX       1       2       3       4
series                                 
1        240.18  338.07  365.87  429.08
2        232.14  307.74  392.48  448.39
4        258.53  359.16  399.97  465.15
8        276.69  339.25  410.75  469.33


AIC_TRX      1      2      3      4
series                             
1        27.84  18.73  38.75  13.85
2        28.14  27.47  24.18   9.39
4        29.53  36.63  24.78  15.52
8        30.69  38.59  21.55  39.23

CodePudding user response:

Just use the df.apply method to average across each column based on series and AIC_TRX grouping.

result = df1.groupby(['series', 'AIC_TRX']).apply(np.mean, axis=1)

Result:

series  AIC_TRX    
1       1        0     120.738
        2        4     156.281
        3        8     170.285
        4        12    196.270
2       1        1     122.358
        2        5     152.758
        3        9     184.494
        4        13    205.175
4       1        2     135.471
        2        6     171.968
        3        10    187.825
        4        14    214.907
8       1        3     142.183
        2        7     162.849
        3        11    196.851
        4        15    216.455
dtype: float64
  • Related