Home > front end >  Panda Dataframe - aggregate average by columns range
Panda Dataframe - aggregate average by columns range

Time:05-11

I have following data frame

ID    M1   M2   M3   D1   D2   
1     0,2  0,5  0,2  1    3  
2     0,1  0,4  1    4    6   
3     1    4    1    1    1  

My goal is to create aggregated column average for each of the rows, grouping from M1 to M3, D1 to D2, expected output:

ID   M_avg    D_avg
1    0,3      2
2    0,5      5
3    2        1

I'm confused on how to do that. Thx

CodePudding user response:

Convert ID to index and for all another columns need numeric values:

df = df.set_index('ID').replace(',','.', regex=True).astype(float)

Then aggregate by columns names with Series.str.extractSeries.str.extractall for remove numbers with mean and DataFrame.add_suffix:

df = (df.groupby(df.columns.str.extract('(\D )', expand=False), axis=1)
        .mean()
        .add_suffix('_avg')
        .reset_index())
print (df)
   ID  D_avg  M_avg
0   1    2.0    0.3
1   2    5.0    0.5
2   3    1.0    2.0

Or use str.replace:

df = df.set_index('ID').replace(',','.', regex=True).astype(float)

df = (df.groupby(df.columns.str.replace('\d ', '', regex=True), axis=1)
        .mean()
        .add_suffix('_avg')
        .reset_index())
print (df)
   ID  D_avg  M_avg
0   1    2.0    0.3
1   2    5.0    0.5
2   3    1.0    2.0

CodePudding user response:

df["M_avg"] = df.iloc[:,:3].mean(axis=1)

df["D_avg"] = df.iloc[:,3:5].mean(axis=1)

  • Related