I have two datasets. Below you can see codes and data
import pandas as pd
import numpy as np
pd.set_option('max_columns', None)
import matplotlib.pyplot as plt
data = {'type_sale': ['group_1','group_2','group_3','group_4','group_5','group_6','group_7','group_8','group_9','group_10'],
'id':[70,20,24,80,20,20,60,20,20,20],
}
df1 = pd.DataFrame(data, columns = ['type_sale',
'id',])
data = {'type_sale': ['group_1','group_2','group_3'],
'id':[70,20,24],
}
df2 = pd.DataFrame(data, columns = ['type_sale',
'id',])
These codes created two datasets that are shown below :
Now I want to create a new data set df3 with values from df1 that are different (distinct values) from the values df2 in the column id
.
The final results should as pic below
I tried with these codes but are not giving desired results.
df = pd.concat((df1, df2))
print(df.drop_duplicates('id'))
So can anybody help me how to solve this problem?
CodePudding user response:
Try as follows:
- Use
df.isin
to check for each value indf['id']
whether it is contained indf2['id']
. - Next, invert the resulting boolean
pd.Series
by using the unary operator~
(tilde) and select fromd1
. - Finally, reset the index.
In a one-liner:
df3 = df1[~df1['id'].isin(df2['id'])].reset_index(drop=True)
print(df3)
type_sale id
0 group_4 80
1 group_7 60