I have two data frames. My final goal is to compare a column in both data frames and return those values which donot match with each other example:
df_1["column_1"]= ["A45", "kl24", "mhg", "tz22" ]
df_2["column_2"]= ["KL24", "tz22", "mhg", "A 45"]
I need a code that comparing two array values in the respective dataframe["column"] and returns those values from df_1 which did not match in df_2(Ex: from our example "A45" and "kl24" will return because there is a space and upper and lower case error)
Can anyone kindly please help me with this !
CodePudding user response:
Use pd.merge(..., how='left', indicator=True)
and select the rows with left_only
:
>>> df = df_1.merge(df_2, how='outer', left_on='column_1', right_on='column_2', indicator=True)
>>> df
column_1 column_2 _merge
0 A45 NaN left_only
1 kl24 NaN left_only
2 mhg mhg both
3 tz22 tz22 both
4 NaN KL24 right_only
5 NaN A 45 right_only
>>> df[df._merge == 'left_only']['column_1']
0 A45
1 kl24
Name: column_1, dtype: object
CodePudding user response:
You can use df.isin
and df.loc
to do the trick
df_1.loc[~df_1["column_1"].isin(df_2["column_2"])]
column_1
0 A45
1 kl24