Python pandas merge map with multiple values xlookup-CodePudding

I have a dataframe of actor names:

df1

actor_id    actor_name
1   Brad Pitt
2   Nicole Kidman
3   Matthew Goode
4   Uma Thurman
5   Ethan Hawke

And another dataframe of movies that the actors were in:

df2

actor_id    actor_movie movie_revenue_m
1   Once Upon a Time in Hollywood   150
2   The Others  50
2   Moulin Rouge    200
3   Stoker  75
4   Kill Bill   125
5   Gattaca 85

I want to merge the two dataframes together to show the actors with their movie names and movie revenues, so I use the merge function:

df3 = df1.merge(df2, on = 'actor_id', how = 'left')

df3

actor_id    actor_name  actor_movie movie_revenue
1   Brad Pitt   Once Upon a Time in Hollywood   150
2   Nicole Kidman   Moulin Rouge    50
2   Nicole Kidman   The Others  200
3   Matthew Goode   Stoker  75
4   Uma Thurman Kill Bill   125
5   Ethan Hawke Gattaca 85

But this pulls in all movies, so Nicole Kidman gets duplicated, and I only want to show one movie per actor. How can I merge the dataframes without "duplicating" my list of actors?

How would I merge the movie title that is alphabetically first?

How would I merge the movie title with the highest revenue?

Thank you!

CodePudding user response：

One way is to continue with the merge and then filter the result set

movie title that is alphabetically first

# sort by name, movie and then pick the first while grouping by actor
df.sort_values(['actor_name','actor_movie'] ).groupby('actor_id', as_index=False).first()

    actor_id    actor_name  actor_movie     movie_revenue
0   1   Brad Pitt   Once Upon a Time in Hollywood   150
1   2   Nicole Kidman   Moulin Rouge    50
2   3   Matthew Goode   Stoker  75
3   4   Uma Thurman     Kill Bill   125
4   5   Ethan Hawke     Gattaca     85

movie title with the highest revenue

# sort by name, and review (descending), groupby actor and pick first
df.sort_values(['actor_name','movie_revenue'], ascending=[1,0] ).groupby('actor_id', as_index=False).first()

    actor_id    actor_name  actor_movie     movie_revenue
0   1   Brad Pitt   Once Upon a Time in Hollywood   150
1   2   Nicole Kidman   The Others  200
2   3   Matthew Goode   Stoker  75
3   4   Uma Thurman     Kill Bill   125
4   5   Ethan Hawke     Gattaca     85