I am working on a project for university in which I received two dataframes:
import pandas as pd
df1 = pd.DataFrame({'id': ['0','1','2','3'],
'name': ['sensor', 'actuador', 'sensor', 'sensor'],
'type':['analog', 'analog', 'digital', 'analog']})
print(df1)
id name type
0 sensor analog
1 actuador analog
2 sensor digital
3 sensor analog
df2 = pd.DataFrame({'inst': ['0','10','2','143'],
'number': [100, 200, 300, 400]})
print(df2)
inst number
0 100
10 200
2 300
143 400
I would like to check if the 'id' of df1 is different from the 'inst' of df2. I would only need to return lines from df1 where the 'id' is not contained in the 'inst' of df2.
I thought of doing a merge between the dataframes using the 'id' and 'inst' columns as follows:
merged_df = df1.merge(df2, left_on=['id'], right_on=['inst'])
The output is:
print(merged_df)
id name type inst number
0 sensor analog 0 100
2 sensor digital 2 300
However, the desired output is the opposite of this. I would like it to return the df1 with only the lines where the 'id' is not in 'inst'. Also, I would like it to return only the df1 columns.
The desired output is:
print(df_result)
id name type
1 actuador analog
3 sensor analog
CodePudding user response:
If there is only one column use boolean indexing
with inverted mask by Series.isin
:
df_result = df1[~df1['id'].isin(df2['inst'])]
print (df_result)
id name type
1 1 actuador analog
3 3 sensor analog
If possible check per multiple columns is possible use alternative solution with indicator
parameter and filtering left_only
rows:
df_result = (df1.merge(df2, left_on=['id'], right_on=['inst'], how='left', indicator=True)
.query("_merge == 'left_only'")
.reindex(df1.columns, axis=1))
print (df_result)
id name type
1 1 actuador analog
3 3 sensor analog