Compare rows of two dataframes in pandas-CodePudding

I have two dataframes, the first is the data I currently have in the database, the second would be a file that might have changed fields: name and/or cnpj and/or create_date

Based on that, I need to create a third dataframe with only the rows that have undergone some kind of change, as in the example of the expected output.

The key to making the comparisons needs to be: id_account

Dataframe 1:

id_account	name	cnpj	create_date
10	Agency Criss	10203040	2022-05-30
20	Agency Angel	11213141	2022-05-30
30	Supermarket Mario Bros	12223242	2022-05-30
40	Agency Mister M	13233343	2022-05-30
50	Supermarket Pokemon	14243454	2022-05-30
60	Supermarket of Dreams	15253580	2022-05-30

Dataframe 2:

id_account	name	cnpj	create_date
10	Supermarket Carol	80502030	2022-05-30
20	Agency Angel	11213141	2022-05-30
30	Supermarket Mario Bros	12223242	2022-05-30
40	Supermarket Magical	60304050	2022-05-30
50	Supermarket Pokemon	14243454	2022-05-30
60	Supermarket of Dreams	90804050	2022-05-30

Expected output:

id_account	name	cnpj	create_date
10	Supermarket Carol	80502030	2022-05-30
40	Supermarket Magical	60304050	2022-05-30
60	Supermarket of Dreams	90804050	2022-05-30

How can I do this? I've looked for a few ways, but I'm confused by the index.

CodePudding user response：

If the data has same columns, but different number of rows, this is one possible solution:

res = (pd.concat([df1,df2])
       .drop_duplicates(keep=False)
       .drop_duplicates(subset='id_account', keep='last')
      )

Output:

   id_account                   name      cnpj create_date
0          10      Supermarket Carol  80502030  2022-05-30
3          40    Supermarket Magical  60304050  2022-05-30
5          60  Supermarket of Dreams  90804050  2022-05-30

CodePudding user response：

If your two dataframes have the same columns and the same number of rows, you can check equate them and use any(axis=1) to find rows where any columns changes:

new_df = df2[(df1 != df2).any(axis=1)]

Output:

>>> new_df
   id_account                   name      cnpj create_date
0          10      Supermarket Carol  80502030  2022-05-30
3          40    Supermarket Magical  60304050  2022-05-30
5          60  Supermarket of Dreams  90804050  2022-05-30