Compare 2 pandas.DataFrames, get differences and print only rows that changed from the first one-CodePudding

I have 2 dataframes which I am comparing with below snippet:

df3 = pandas.concat([df1, df2]).drop_duplicates(keep=False)

It works fine, it compares both and as an output I got rows that are different form both of them.

What I would like to achieve is to compare 2 dataframes to get rows that are different but as an output only get/keep rows from the first DataFrame.

Is there an easy way to do this?

CodePudding user response：

I would use ~isin():

df.set_index(list(df.columns), inplace=True)
df2.set_index(list(df2.columns), inplace=True)
df[~df.index.isin(df2.index)].reset_index()

CodePudding user response：

If you only want the unique rows from the first dataframe, you really want a left join.

df3 = df1.merge(df2.drop_duplicates(), on='your_column_here', 
                   how='left', indicator=True)

Now you can check the _merge column and filter for left only:

    col1    col2    _merge
0      1      10    both
1      2      11    both
2      3      12    both
3      4      13    left_only
4      5      14    left_only
5      3      10    left_only

CodePudding user response：

One way is to pre-mark the df's rows with a number (like .assign(mark=1)) and drop the helper column after

df1 = pd.DataFrame(np.random.randint(-10, 10, 20)) # dummy data
df2 = pd.DataFrame(np.random.randint(-10, 10, 20)) # dummy data

df3 = pd.concat([df1.assign(mark=1), df2.assign(mark=2)]).drop_duplicates(keep=False)
print(df3[df3['mark'].eq(1)].drop(columns='mark'))

Prints: