I have a sample dataframe that looks like this:
primaryName averageRating primaryProfession knownForTitles runtimeMinutes
1 Fred Astaire 7.0 soundtrack,actor,miscellaneous tt0072308 165
2 Fred Astaire 6.9 soundtrack,actor,miscellaneous tt0031983 93
3 Fred Astaire 7.0 soundtrack,actor,miscellaneous tt0050419 103
4 Fred Astaire 7.1 soundtrack,actor,miscellaneous tt0053137 134
So basically i want to take the average of averageRating
column, extract "actor/actress" from primaryProfession
column, count of knownForTitles
column and sum of runtimeMinutes
column based on primaryName
column.
The output dataframe should look like this:
primaryName averageRating primaryProfession knownForTitles runtimeMinutes
1 Fred Astaire 28 actor 4 495
Any ideas how i can achieve this? Thanks in advance for the help.
CodePudding user response:
Try this:
df.loc[df['primaryProfession'].str.contains('actor'), 'primaryProfession'] = 'actor'
df.loc[df['primaryProfession'].str.contains('actress'), 'primaryProfession'] = 'actress'
df.groupby(['primaryName', 'primaryProfession'], as_index=False).agg({'averageRating':'mean', 'knownForTitles':'count', 'runtimeMinutes':'sum'})