I have the following dataframe
data = [
{'col1': 11, 'col2': 111, 'col3': 1111},
{'col1': 22, 'col2': 222, 'col3': 2222},
{'col1': 33, 'col2': 333, 'col3': 3333},
{'col1': 44, 'col2': 444, 'col3': 4444}
]
and the following list:
lst = [(11, 111), (22, 222), (99, 999)]
I would like to get out of my data only rows that col1 and col2 do not exist in the lst
result for above example would be:
[
{'col1': 33, 'col2': 333, 'col3': 3333},
{'col1': 44, 'col2': 444, 'col3': 4444}
]
how can I achieve that?
import pandas as pd
df = pd.DataFrame(data)
list_df = pd.DataFrame(lst)
# command like ??
# df.subtract(list_df)
CodePudding user response:
You can create a tuple out of your col1 and col2 columns and then check if those tuples are in the lst list. Then drop the fines with True values.
df.drop(df.apply(lambda x: (x['col1'], x['col2']), axis =1)
.isin(lst)
.loc[lambda x: x==True]
.index)
With this solution you don't even have to make the second list a dataframe
CodePudding user response:
If need test by pairs is possible compare MultiIndex
created by both columns in Index.isin
with inverted mask by ~
in boolean indexing
:
df = df[~df.set_index(['col1','col2']).index.isin(lst)]
print (df)
col1 col2 col3
2 33 333 3333
3 44 444 4444
Or with left join by merge
with indicator parameter:
m = df.merge(list_df,
left_on=['col1','col2'],
right_on=[0,1],
indicator=True,
how='left')['_merge'].eq('left_only')
df = df[mask]
print (df)
col1 col2 col3
2 33 333 3333
3 44 444 4444
CodePudding user response:
You can create the tuples of col1
and col2
by .apply()
with tuple
. Then test these tuples whether in lst
by .isin()
(add ~
for the negation/opposite condition).
Finally, locate the rows with .loc
, as follows:
df.loc[~df[['col1', 'col2']].apply(tuple, axis=1).isin(lst)]
Result:
col1 col2 col3
2 33 333 3333
3 44 444 4444
CodePudding user response:
You can extract the list of values using zip
and slice using a mask generated with isna
:
a,b = zip(*lst)
data[~(data['col1'].isin(a)|data['col2'].isin(b))]
output:
col1 col2 col3
2 33 333 3333
3 44 444 4444
Or if you need both conditions to be true to drop:
data[~(data['col1'].isin(a)&data['col2'].isin(b))]
NB. if you have many columns, you can automate the process:
mask = sum(data[col].isin(v) for col,v in zip(data, zip(*lst))).eq(0)
df[mask]