Home > Blockchain >  How to use pandas to check for list of values from a csv spread sheet while filtering out certain ke
How to use pandas to check for list of values from a csv spread sheet while filtering out certain ke

Time:09-17

Hey guys this is my first post. I am planning on building an anime recommendation engine using python. I came across a problem where I made a list called genre_list which stores the genres that I want to filter from the huge data spreadsheet I was given. I am using the Pandas library and it has an isin() function to check if the values of a list is included in the datasheet and its supposed to filter it out. I am using the function but its not able to detect "Action" from the datasheet although it is there. I got a feeling there's something wrong with the data types and I probably have to work around it somehow but I'm not sure how.

I downloaded my csv file from this link for anyone interested! https://www.kaggle.com/datasets/marlesson/myanimelist-dataset-animes-profiles-reviews?resource=download

import pandas as pd

df = pd.read_csv('animes.csv')

genre = True
genre_list = []

while genre:
    genre_input = input("What genres would you like to watch?, input \"done\" when done listing!\n")
    if genre_input == "done":
        genre = False
    else:
        genre_list.append(genre_input)
print(genre_list)
df_genre = df[df["genre"].isin(genre_list)]
# df_genre = df["genre"]
print(df_genre) 

Outout: [1]: https://i.stack.imgur.com/XZzcc.png

CodePudding user response:

You want to check if ANY value in your user input list is in each of the list values in the "genre" column. The "isin" function will check if your input in it's entirety is in a cell value, which is not what you want here. Change that line to this:

df_genre = df[df['genre'].apply(lambda x: any([i in x for i in genre_list]))]

Let me know if you need any more help.

CodePudding user response:

import pandas as pd

df = pd.read_csv('animes.csv')

genre = True
genre_list = []

while genre:
    genre_input = input("What genres would you like to watch?, input \"done\" when done listing!\n")
    if genre_input == "done":
        genre = False
    else:
        genre_list.append(genre_input)

# List of all cells and their genre put into a list
col_list = df["genre"].values.tolist()
temp_list = []

# Each val in the list is compared with the genre_list to see if there is a match
for index, val in enumerate(col_list):
    if all(x in val for x in genre_list):
        # If there is a match, the UID of that cell is added to a temp_list
        temp_list.append(df['uid'].iloc[index])
print(temp_list)

# This checks if the UID is contained in the temp_list of UIDs that have these genres
df_genre = df["uid"].isin(temp_list)
new_df = df.loc[df_genre, "title"]
# Prints all Anime with the specified genres
print(new_df)

This is another approach I took and works as well. Thanks for all the help :D

CodePudding user response:

To make a selection from a dataframe, you can write this:

df_genre = df.loc[df['genre'].isin(genre_list)]
  • Related