I have following data frame
ID M1 M2 M3 D1 D2
1 0,2 0,5 0,2 1 3
2 0,1 0,4 1 4 6
3 1 4 1 1 1
My goal is to create aggregated column average for each of the rows, grouping from M1 to M3, D1 to D2, expected output:
ID M_avg D_avg
1 0,3 2
2 0,5 5
3 2 1
I'm confused on how to do that. Thx
CodePudding user response:
Convert ID
to index and for all another columns need numeric values:
df = df.set_index('ID').replace(',','.', regex=True).astype(float)
Then aggregate by columns names with Series.str.extract
Series.str.extractall
for remove numbers with mean
and DataFrame.add_suffix
:
df = (df.groupby(df.columns.str.extract('(\D )', expand=False), axis=1)
.mean()
.add_suffix('_avg')
.reset_index())
print (df)
ID D_avg M_avg
0 1 2.0 0.3
1 2 5.0 0.5
2 3 1.0 2.0
Or use str.replace
:
df = df.set_index('ID').replace(',','.', regex=True).astype(float)
df = (df.groupby(df.columns.str.replace('\d ', '', regex=True), axis=1)
.mean()
.add_suffix('_avg')
.reset_index())
print (df)
ID D_avg M_avg
0 1 2.0 0.3
1 2 5.0 0.5
2 3 1.0 2.0
CodePudding user response:
df["M_avg"] = df.iloc[:,:3].mean(axis=1)
df["D_avg"] = df.iloc[:,3:5].mean(axis=1)