Home > Software engineering >  Python pandas merge map with multiple values xlookup
Python pandas merge map with multiple values xlookup

Time:11-04

I have a dataframe of actor names:

df1

actor_id    actor_name
1   Brad Pitt
2   Nicole Kidman
3   Matthew Goode
4   Uma Thurman
5   Ethan Hawke

And another dataframe of movies that the actors were in:

df2

actor_id    actor_movie movie_revenue_m
1   Once Upon a Time in Hollywood   150
2   The Others  50
2   Moulin Rouge    200
3   Stoker  75
4   Kill Bill   125
5   Gattaca 85

I want to merge the two dataframes together to show the actors with their movie names and movie revenues, so I use the merge function:

df3 = df1.merge(df2, on = 'actor_id', how = 'left')

df3

actor_id    actor_name  actor_movie movie_revenue
1   Brad Pitt   Once Upon a Time in Hollywood   150
2   Nicole Kidman   Moulin Rouge    50
2   Nicole Kidman   The Others  200
3   Matthew Goode   Stoker  75
4   Uma Thurman Kill Bill   125
5   Ethan Hawke Gattaca 85

But this pulls in all movies, so Nicole Kidman gets duplicated, and I only want to show one movie per actor. How can I merge the dataframes without "duplicating" my list of actors?

How would I merge the movie title that is alphabetically first?

How would I merge the movie title with the highest revenue?

Thank you!

CodePudding user response:

One way is to continue with the merge and then filter the result set

movie title that is alphabetically first

# sort by name, movie and then pick the first while grouping by actor
df.sort_values(['actor_name','actor_movie'] ).groupby('actor_id', as_index=False).first()
    actor_id    actor_name  actor_movie     movie_revenue
0   1   Brad Pitt   Once Upon a Time in Hollywood   150
1   2   Nicole Kidman   Moulin Rouge    50
2   3   Matthew Goode   Stoker  75
3   4   Uma Thurman     Kill Bill   125
4   5   Ethan Hawke     Gattaca     85

movie title with the highest revenue

# sort by name, and review (descending), groupby actor and pick first
df.sort_values(['actor_name','movie_revenue'], ascending=[1,0] ).groupby('actor_id', as_index=False).first()
    actor_id    actor_name  actor_movie     movie_revenue
0   1   Brad Pitt   Once Upon a Time in Hollywood   150
1   2   Nicole Kidman   The Others  200
2   3   Matthew Goode   Stoker  75
3   4   Uma Thurman     Kill Bill   125
4   5   Ethan Hawke     Gattaca     85
  • Related