I have 2 dataframes which I am comparing with below snippet:
df3 = pandas.concat([df1, df2]).drop_duplicates(keep=False)
It works fine, it compares both and as an output I got rows that are different form both of them.
What I would like to achieve is to compare 2 dataframes to get rows that are different but as an output only get/keep rows from the first DataFrame.
Is there an easy way to do this?
CodePudding user response:
I would use ~isin()
:
df.set_index(list(df.columns), inplace=True)
df2.set_index(list(df2.columns), inplace=True)
df[~df.index.isin(df2.index)].reset_index()
CodePudding user response:
If you only want the unique rows from the first dataframe, you really want a left join.
df3 = df1.merge(df2.drop_duplicates(), on='your_column_here',
how='left', indicator=True)
Now you can check the _merge
column and filter for left only:
col1 col2 _merge
0 1 10 both
1 2 11 both
2 3 12 both
3 4 13 left_only
4 5 14 left_only
5 3 10 left_only
CodePudding user response:
One way is to pre-mark the df's rows with a number (like .assign(mark=1)
) and drop the helper column after
df1 = pd.DataFrame(np.random.randint(-10, 10, 20)) # dummy data
df2 = pd.DataFrame(np.random.randint(-10, 10, 20)) # dummy data
df3 = pd.concat([df1.assign(mark=1), df2.assign(mark=2)]).drop_duplicates(keep=False)
print(df3[df3['mark'].eq(1)].drop(columns='mark'))
Prints:
0
2 -6
3 -8
14 3
16 -3