I am trying to compare a Pandas Dataframe with a List. I have extracted IDs to a list, called list_x;
Since I have several rows with the same ID, this is reflected on the list. i.e list_x = [1,1,1,1,2,3, etc.]
I am trying to drop all dataframe entries that have an ID that is also in the list
what I have been trying are variations of:
for j in range(len(dataframe)-1):
if dataframe.loc(j,"ID") in list_x: dataframe.drop([j], inplace = True)
or variations of
for j in range(len(dataframe)-1):
for k in range(len(list_x)-1):
if dataframe.loc(j,"ID") in list_x[k]: dataframe.drop([j], inplace = True)
I get an error which I think comes from the fact I am comparing the list's index with the dataframe, and not the actual list entry.
Any help would be appreciated. I do realize code snippets are discouraged, but I would appreciate an example if possible! Thanks in advance :)
CodePudding user response:
You want to get the dataframe without rows associated to IDs in list_x. So you can go for this :
your df (2 columns : ID and value)
df = pd.DataFrame({'ID': [1,3,5,6,7], 'value' : ['red', 'blue', 'green', 'orange', 'purple']})
the list of IDs you don't want in your the dataframe
list_x = [1,1,2,3,5]
the output
df = df[~df.ID.isin(list_x)]