Check if elements list are in column DataFrame-CodePudding

Objective: I have a list of 200 elements(urls) and I would like to check if each one is in a specific column of the Dataframe. If it is, I would like to remove the element from the list.

Problem: I am trying a similar solution by adding to a new list the ones that are not there but it adds all of them.

pruned = []
for element in list1:
    if element not in transfer_history['Link']:
        pruned.append(element)

I have also tried the solution I asked for without success. I think it's a simple thing but I can't find the key.

for element in list1:
    if element in transfer_history['Link']:
        list1.remove(element)

CodePudding user response：

When you use in with a pandas series, you are searching the index, not the values. To get around this, convert the column to a list using transfer_history['Link'].tolist(), or better, convert it to a set.

links = set(transfer_history["Link"])

A good way to filter the list is like this:

pruned = [element for element in list1 if element not in links]

Don't remove elements from the list while iterating over it, which may have unexpected results.

CodePudding user response：

Remember, your syntax for transfer_history['Link'] is the entire column itself. You need to call each item in the column using another array transfer_history['Link'][x]. Use a for loop to iterate through each item in the column.

Or a much easier way is to just check if the item is in a list made of the entire column with a one liner:

pruned = []
for element in list1:
    if element not in [link for link in transfer_history['Link']]:
        pruned.append(element)

CodePudding user response：

If the order of the urls doesn't matter, this can be simplified a lot using sets:

list1 = list(set(list1) - set(transfer_history['Link']))