I have a dataframe with the following columns
rater_id being_rated_id combine_id avg_jump avg_run avg_swim category
100 200 100200 3 3 2 heats
100 200 100200 4 4 1 heats
101 200 101200 1 1 2 finals
101 200 101200 2 3 2 finals
102 201 102201 3 2 3 heats
103 202 103202 4 4 4 finals
I'd like to use a function to loop through the columns with prefix ("avg") and groupby "combine_id" and "category" to create new columns with suffix ("_2") that give the average of the rows that have multiple entries of the "combine_id"
What I am to achieve
rater_id being_rated_id combine_id avg_jump avg_run category avg_jump_2 avg_run_2
100 200 100200 3 2 heats 3.5 2.5
100 200 100200 4 3 heats 3.5 2.5
101 200 101200 1 1 finals 1.5 2
101 200 101200 2 3 finals 1.5 2
102 201 102201 3 2 heats 3 2
103 202 103202 4 4 finals 4 4
I've tried the following code but it doesnt seem to work
collist = ['avg_']
for col in collist:
avgcols = df.filter(like=col).columns
if len(avgcols) > 0:
df[f'{col}_2'] = df.groupby(['combine_id','category'])[avgcols].transform(np.mean)
Appreciate any advice and help, thank you.
CodePudding user response:
Groupby transform, rename the columns using add_suffix and join on index using assign. Code below
df.assign(**df.groupby(["combine_id" , "category"])[['avg_jump', 'avg_run']].transform('mean').add_suffix('_2'))