Home > Back-end >  How to retrieve a column that appear in multiple rows based on a condition with pandas?
How to retrieve a column that appear in multiple rows based on a condition with pandas?

Time:02-10

I am using a large IMDB dataset to do some practicing.

df_movies[(df_movies['movie'] == 'Fight Club (1999)') | (df_movies['movie'] == 'Se7en (1995)')].value_counts('actor')

The above query has given me this dataset.


movie   actor   year    duration    adult_movie rating  votes
3506    Se7en (1995)    Brad Pitt (b.1963)  1995    127.0   0   8.6 1342867.0
3513    Fight Club (1999)   Brad Pitt (b.1963)  1999    139.0   0   8.8 1742920.0
5380    Se7en (1995)    Morgan Freeman (b.1937) 1995    127.0   0   8.6 1342867.0
8065    Se7en (1995)    Kevin Spacey (b.1959)   1995    127.0   0   8.6 1342867.0
40124   Fight Club (1999)   Meat Loaf (b.1947)  1999    139.0   0   8.8 1742920.0
40905   Fight Club (1999)   Edward Norton (b.1969)  1999    139.0   0   8.8 1742920.0
46730   Se7en (1995)    Andrew Kevin Walker (b.1964)    1995    127.0   0   8.6 1342867.0
390010  Fight Club (1999)   Zach Grenier (b.1954)   1999    139.0   0   8.8 1742920.0

Brad Pitt is the only actor on this list to appear in both movies. How can I filter it such that Brad Pitt is returned?

CodePudding user response:

Using pd.value_counts(), you can obtain the number of occurrences of each actor in the list. The result is sorted from highest to lowest.

df.drop_duplicates(['movie', 'actor'])['actor'].value_counts().index[0]

CodePudding user response:

Replace value_counts by groupby_filter:

movies = ['Fight Club (1999)', 'Se7en (1995)']

both_movies = lambda x: len(x) == len(movies)
out = df_movies[df_movies['movie'].isin(movies)].groupby('actor').filter(both_movies)

Output:

>>> out
                  movie               actor  year  duration  adult_movie  rating      votes
3506       Se7en (1995)  Brad Pitt (b.1963)  1995     127.0            0     8.6  1342867.0
3513  Fight Club (1999)  Brad Pitt (b.1963)  1999     139.0            0     8.8  1742920.0
  • Related