Invert selection in Pandas-CodePudding

I have two datasets. Below you can see codes and data

import pandas as pd
import numpy as np
pd.set_option('max_columns', None)
import matplotlib.pyplot as plt

data = {'type_sale': ['group_1','group_2','group_3','group_4','group_5','group_6','group_7','group_8','group_9','group_10'],
         'id':[70,20,24,80,20,20,60,20,20,20],
        }
df1 = pd.DataFrame(data, columns = ['type_sale',
                                      'id',])


data = {'type_sale': ['group_1','group_2','group_3'],
         'id':[70,20,24],
        }
df2 = pd.DataFrame(data, columns = ['type_sale',
                                      'id',])

These codes created two datasets that are shown below :

Now I want to create a new data set df3 with values from df1 that are different (distinct values) from the values df2 in the column id.

The final results should as pic below

I tried with these codes but are not giving desired results.

df = pd.concat((df1, df2))
print(df.drop_duplicates('id'))

So can anybody help me how to solve this problem?

CodePudding user response：

Try as follows:

Use df.isin to check for each value in df['id'] whether it is contained in df2['id'].
Next, invert the resulting boolean pd.Series by using the unary operator ~ (tilde) and select from d1.
Finally, reset the index.

In a one-liner:

df3 = df1[~df1['id'].isin(df2['id'])].reset_index(drop=True)

print(df3)

  type_sale  id
0   group_4  80
1   group_7  60