How to apply multiple operations on multiple columns based on a single column in pandas?-CodePudding

I have a sample dataframe that looks like this:

     primaryName    averageRating                 primaryProfession    knownForTitles runtimeMinutes
1   Fred Astaire            7.0      soundtrack,actor,miscellaneous      tt0072308            165
2   Fred Astaire            6.9      soundtrack,actor,miscellaneous      tt0031983             93
3   Fred Astaire            7.0      soundtrack,actor,miscellaneous      tt0050419            103
4   Fred Astaire            7.1      soundtrack,actor,miscellaneous      tt0053137            134

So basically i want to take the average of averageRating column, extract "actor/actress" from primaryProfession column, count of knownForTitles column and sum of runtimeMinutes column based on primaryName column. The output dataframe should look like this:

     primaryName    averageRating      primaryProfession    knownForTitles   runtimeMinutes
1   Fred Astaire            28                    actor            4            495

Any ideas how i can achieve this? Thanks in advance for the help.

CodePudding user response：

Try this:

df.loc[df['primaryProfession'].str.contains('actor'), 'primaryProfession'] = 'actor'
df.loc[df['primaryProfession'].str.contains('actress'), 'primaryProfession'] = 'actress'

df.groupby(['primaryName', 'primaryProfession'], as_index=False).agg({'averageRating':'mean', 'knownForTitles':'count', 'runtimeMinutes':'sum'})