Home > Mobile >  Why does my use of "isin" to filter my data frame's rows by column based on values in
Why does my use of "isin" to filter my data frame's rows by column based on values in

Time:11-16

I'm trying to build a function that takes specific movie genres linked to a moiveId stored as a list and returns other movies that share one or more of those genres. I can create the list and have confirmed it is a list, but when I use "isin" to use this as a filter, I get a blank dataframe.

First I remove the "|" deliminator in the genres column

inner_join_movies_ratings.genres = inner_join_movies_ratings.genres.str.split("|")

This gives me the following data frame where the "genres" column is an object.

Data frame

Next, I create a variable "genre_list" that contains the genres associated with movie Id entered by the user.

def input_output(x):
    
    y = inner_join_movies_ratings.loc[inner_join_movies_ratings.movieId == x]
    
    #get genres
    genre_list = y.genres[0]
    print(genre_list)
    
    a = inner_join_movies_ratings[inner_join_movies_ratings['genres'].isin(genre_list)]
    print(a)

instead of a returning a data frame of all the movies that contain one of the genres listed in genre_list, I get:

['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'] # The contents of genre_list
Empty DataFrame
Columns: [movieId, title, genres, rating]
Index: []

CodePudding user response:

You can use set intersection to test if two lists overlap. Use apply() to check this for every row.

genre_set = set(y.genres[0])

a = inner_join_movies_ratings[inner_join_movies_ratings['genres'].apply(lambda g: len(genre_set.intersection(g)) > 0)]
  • Related