I'm trying to build a function that takes specific movie genres linked to a moiveId stored as a list and returns other movies that share one or more of those genres. I can create the list and have confirmed it is a list, but when I use "isin" to use this as a filter, I get a blank dataframe.
First I remove the "|" deliminator in the genres column
inner_join_movies_ratings.genres = inner_join_movies_ratings.genres.str.split("|")
This gives me the following data frame where the "genres" column is an object.
Next, I create a variable "genre_list" that contains the genres associated with movie Id entered by the user.
def input_output(x):
y = inner_join_movies_ratings.loc[inner_join_movies_ratings.movieId == x]
#get genres
genre_list = y.genres[0]
print(genre_list)
a = inner_join_movies_ratings[inner_join_movies_ratings['genres'].isin(genre_list)]
print(a)
instead of a returning a data frame of all the movies that contain one of the genres listed in genre_list, I get:
['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'] # The contents of genre_list
Empty DataFrame
Columns: [movieId, title, genres, rating]
Index: []
CodePudding user response:
You can use set intersection to test if two lists overlap. Use apply()
to check this for every row.
genre_set = set(y.genres[0])
a = inner_join_movies_ratings[inner_join_movies_ratings['genres'].apply(lambda g: len(genre_set.intersection(g)) > 0)]