Home > OS >  Python Loop Function to Create New Columns based on Groupby of multiple columns
Python Loop Function to Create New Columns based on Groupby of multiple columns

Time:12-14

I have a dataframe with the following columns

rater_id   being_rated_id   combine_id  avg_jump   avg_run   avg_swim   category
100        200              100200      3          3         2          heats
100        200              100200      4          4         1          heats
101        200              101200      1          1         2          finals
101        200              101200      2          3         2          finals
102        201              102201      3          2         3          heats
103        202              103202      4          4         4          finals

I'd like to use a function to loop through the columns with prefix ("avg") and groupby "combine_id" and "category" to create new columns with suffix ("_2") that give the average of the rows that have multiple entries of the "combine_id"

What I am to achieve

rater_id   being_rated_id   combine_id  avg_jump   avg_run   category  avg_jump_2  avg_run_2
100        200              100200      3          2         heats     3.5         2.5
100        200              100200      4          3         heats     3.5         2.5
101        200              101200      1          1         finals    1.5         2
101        200              101200      2          3         finals    1.5         2
102        201              102201      3          2         heats     3           2
103        202              103202      4          4         finals    4           4 

I've tried the following code but it doesnt seem to work


collist = ['avg_']

for col in collist:
    avgcols = df.filter(like=col).columns
    if len(avgcols) > 0:
        df[f'{col}_2'] = df.groupby(['combine_id','category'])[avgcols].transform(np.mean)  

Appreciate any advice and help, thank you.

CodePudding user response:

Groupby transform, rename the columns using add_suffix and join on index using assign. Code below

df.assign(**df.groupby(["combine_id" , "category"])[['avg_jump', 'avg_run']].transform('mean').add_suffix('_2'))
  • Related