Home > Software engineering >  Comparison of two data frames and getting the differences by two columns - Pandas
Comparison of two data frames and getting the differences by two columns - Pandas

Time:10-20

I would like to compare two data frames, for example

import pandas as pd

az_df = pd.DataFrame({'name': ['CR1', 'CR2'], 'age': [1, 5], 'dr':[1, 2]})[['name', 'age']]
za_df = pd.DataFrame({'name': ['CR2', 'CR1'], 'age': [2, 1], 'dr':[2, 4]})[['name', 'age']]

AZ_DF table:

name age dr
CR1 1 1
CR2 5 2

ZA_DF table:

name age dr
CR1 1 4
CR2 2 2

And I want to get the summary table of different values grouped by 'name' and 'age' columns between az_df and za_df, like:

name only in AZ only in ZA
CR2 5 2

So far, I did merged them,

merge = pd.merge(az_df, za_df, how='outer', indicator=True)

For az_df, different values are:

only_in_az = merge[merge['_merge'] == 'left_only']

And for za_df:

only_in_za = merge[merge['_merge'] == 'right_only']

However, I don't know how to build the summary table, I mentioned above, showing the different names' ages for az and za data frames.

Thank you.

CodePudding user response:

I think you can get what you want from the merge dataframe using pivot_table:

pd.pivot_table(merge.query('_merge != "both"'), values='age', index='name', columns='_merge').reset_index().rename_axis(None, axis=1).rename(columns={'left_only':'only in AZ','right_only':'only in ZA'})

name    only in AZ  only in ZA
0   CR2 5   2

CodePudding user response:

Try this:

import pandas as pd

az_df = pd.DataFrame({'name': ['CR1', 'CR2'], 'age': [1, 5], 'dr':[1, 2]})[['name', 'age']]
za_df = pd.DataFrame({'name': ['CR2', 'CR1'], 'age': [2, 1], 'dr':[2, 4]})[['name', 'age']]

merge = pd.merge(az_df, za_df, on='name', how='outer')
merge.rename(columns={'age_x': 'only in AZ', 'age_y': 'only in ZA'}, inplace=True)
merge

  name  only in AZ  only in ZA
0  CR1           1           1
1  CR2           5           2

If you want to remove duplicates:

merge = merge[merge['only in AZ'] != merge['only in ZA']]
merge

  name  only in AZ  only in ZA
1  CR2           5           2
  • Related