Home > other >  Invert selection in Pandas
Invert selection in Pandas

Time:09-06

I have two datasets. Below you can see codes and data

import pandas as pd
import numpy as np
pd.set_option('max_columns', None)
import matplotlib.pyplot as plt

data = {'type_sale': ['group_1','group_2','group_3','group_4','group_5','group_6','group_7','group_8','group_9','group_10'],
         'id':[70,20,24,80,20,20,60,20,20,20],
        }
df1 = pd.DataFrame(data, columns = ['type_sale',
                                      'id',])


data = {'type_sale': ['group_1','group_2','group_3'],
         'id':[70,20,24],
        }
df2 = pd.DataFrame(data, columns = ['type_sale',
                                      'id',])

These codes created two datasets that are shown below :

df1

df2

Now I want to create a new data set df3 with values from df1 that are different (distinct values) from the values df2 in the column id.

The final results should as pic below

enter image description here

I tried with these codes but are not giving desired results.

df = pd.concat((df1, df2))
print(df.drop_duplicates('id'))

So can anybody help me how to solve this problem?

CodePudding user response:

Try as follows:

  • Use df.isin to check for each value in df['id'] whether it is contained in df2['id'].
  • Next, invert the resulting boolean pd.Series by using the unary operator ~ (tilde) and select from d1.
  • Finally, reset the index.

In a one-liner:

df3 = df1[~df1['id'].isin(df2['id'])].reset_index(drop=True)

print(df3)

  type_sale  id
0   group_4  80
1   group_7  60
  • Related