In the dataframe below:
import pandas as pd
import numpy as np
df= {
'Gen':['M','M','M','M','F','F','F','F','M','M','M','M','F','F','F','F'],
'Site':['FRX','FX','FRX','FRX','FRX','FX','FRX','FX','FX','FX','FX','FRX','FRX','FRX','FRX','FRX'],
'Type':['L','L','L','L','L','L','L','L','R','R','R','R','R','R','R','R'],
'AIC':['<1','<1','<1','<1',1,1,1,1,2,2,2,2,'>2','>2','>2','>2'],
'AIC_TRX':[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4],
'diff':[-1,-1,-1,-1,0,0,0,0,1,1,1,1,3,3,3,3],
'series':[1,2,4,8,1,2,4,8,1,2,4,8,1,2,4,8],
'Grwth_Time1':[150.78,162.34,188.53,197.69,208.07,217.76,229.48,139.51,146.87,182.54,189.57,199.97,229.28,244.73,269.91,249.19],
'Grwth_Time2':[250.78,262.34,288.53,297.69,308.07,317.7,329.81,339.15,346.87,382.54,369.59,399.97,329.28,347.73,369.91,349.12],
'Grwth_Time3':[240.18,232.14,258.53,276.69,338.07,307.74,359.16,339.25,365.87,392.48,399.97,410.75,429.08,448.39,465.15,469.33],
'Grwth_Time4':[270.84,282.14,298.53,306.69,318.73,327.47,369.63,389.59,398.75,432.18,449.78,473.55,494.85,509.39,515.52,539.23],
'Grwth_Time5':[25.78,22.34,28.53,27.69,30.07,17.7,29.81,33.15,34.87,32.54,36.59,39.97,29.28,34.73,36.91,34.12],
'Grwth_Time6':[240.18,232.14,258.53,276.69,338.07,307.74,359.16,339.25,365.87,392.48,399.97,410.75,429.08,448.39,465.15,469.33],
'Grwth_Time7':[27.84,28.14,29.53,30.69,18.73,27.47,36.63,38.59,38.75,24.18,24.78,21.55,13.85,9.39,15.52,39.23],
}
df = pd.DataFrame(df,columns = ['Gen','Site','Type','AIC','AIC_TRX','diff','series','Grwth_Time1','Grwth_Time2','Grwth_Time3','Grwth_Time4','Grwth_Time5','Grwth_Time6','Grwth_Time7'])
df.info()
I want to do the following:
Find the average of each unique
series
perAIC_TRX
for eachGrwth_Time
(Grwth_Time1
,Grwth_Time2
,....
,Grwth_Time7
)Export all the outputs as one
xlsx
file (refer to the figure below)
The desired outputs look like the figure below (note: the numbers in this output are not the actual average values, they were randomly generated)
My attempt:
# Select the columns -> AIC_TRX, series, Grwth_Time1,Grwth_Time2,....,Grwth_Time7
df1 = df[['AIC_TRX', 'diff', 'series',
'Grwth_Time1', 'Grwth_Time2', 'Grwth_Time3', 'Grwth_Time4',
'Grwth_Time5', 'Grwth_Time6', 'Grwth_Time7']]
#Below is where I need help, I want to groupby the 'series' and 'AIC_TRX' for all the 'Grwth_Time1_to_7'
df1.groupby('series').Grwth_Time1.agg(['mean'])
Thanks in advance
CodePudding user response:
You have to groupby
two columns: ['series', 'AIC_TRX']
and find mean of each Grwth_Time
.
df.groupby(['series', 'AIC_TRX'])[['Grwth_Time1', 'Grwth_Time2', 'Grwth_Time3',
'Grwth_Time4', 'Grwth_Time5', 'Grwth_Time6', 'Grwth_Time7']].mean().unstack().to_excel("output.xlsx")
Output:
AIC_TRX 1 2 3 4
series
1 150.78 208.07 146.87 229.28
2 162.34 217.76 182.54 244.73
4 188.53 229.48 189.57 269.91
8 197.69 139.51 199.97 249.19
AIC_TRX 1 2 3 4
series
1 250.78 308.07 346.87 329.28
2 262.34 317.70 382.54 347.73
4 288.53 329.81 369.59 369.91
8 297.69 339.15 399.97 349.12
AIC_TRX 1 2 3 4
series
1 240.18 338.07 365.87 429.08
2 232.14 307.74 392.48 448.39
4 258.53 359.16 399.97 465.15
8 276.69 339.25 410.75 469.33
AIC_TRX 1 2 3 4
series
1 270.84 318.73 398.75 494.85
2 282.14 327.47 432.18 509.39
4 298.53 369.63 449.78 515.52
8 306.69 389.59 473.55 539.23
AIC_TRX 1 2 3 4
series
1 25.78 30.07 34.87 29.28
2 22.34 17.70 32.54 34.73
4 28.53 29.81 36.59 36.91
8 27.69 33.15 39.97 34.12
AIC_TRX 1 2 3 4
series
1 240.18 338.07 365.87 429.08
2 232.14 307.74 392.48 448.39
4 258.53 359.16 399.97 465.15
8 276.69 339.25 410.75 469.33
AIC_TRX 1 2 3 4
series
1 27.84 18.73 38.75 13.85
2 28.14 27.47 24.18 9.39
4 29.53 36.63 24.78 15.52
8 30.69 38.59 21.55 39.23
CodePudding user response:
Just use the df.apply
method to average across each column based on series
and AIC_TRX
grouping.
result = df1.groupby(['series', 'AIC_TRX']).apply(np.mean, axis=1)
Result:
series AIC_TRX
1 1 0 120.738
2 4 156.281
3 8 170.285
4 12 196.270
2 1 1 122.358
2 5 152.758
3 9 184.494
4 13 205.175
4 1 2 135.471
2 6 171.968
3 10 187.825
4 14 214.907
8 1 3 142.183
2 7 162.849
3 11 196.851
4 15 216.455
dtype: float64