Objective: I have a list of 200 elements(urls) and I would like to check if each one is in a specific column of the Dataframe. If it is, I would like to remove the element from the list.
Problem: I am trying a similar solution by adding to a new list the ones that are not there but it adds all of them.
pruned = []
for element in list1:
if element not in transfer_history['Link']:
pruned.append(element)
I have also tried the solution I asked for without success. I think it's a simple thing but I can't find the key.
for element in list1:
if element in transfer_history['Link']:
list1.remove(element)
CodePudding user response:
When you use in
with a pandas series, you are searching the index, not the values. To get around this, convert the column to a list using transfer_history['Link'].tolist()
, or better, convert it to a set.
links = set(transfer_history["Link"])
A good way to filter the list is like this:
pruned = [element for element in list1 if element not in links]
Don't remove elements from the list while iterating over it, which may have unexpected results.
CodePudding user response:
Remember, your syntax for transfer_history['Link']
is the entire column itself. You need to call each item in the column using another array transfer_history['Link'][x]
. Use a for loop to iterate through each item in the column.
Or a much easier way is to just check if the item is in a list made of the entire column with a one liner:
pruned = []
for element in list1:
if element not in [link for link in transfer_history['Link']]:
pruned.append(element)
CodePudding user response:
If the order of the urls doesn't matter, this can be simplified a lot using sets:
list1 = list(set(list1) - set(transfer_history['Link']))