I would like to compare two data frames, for example
import pandas as pd
az_df = pd.DataFrame({'name': ['CR1', 'CR2'], 'age': [1, 5], 'dr':[1, 2]})[['name', 'age']]
za_df = pd.DataFrame({'name': ['CR2', 'CR1'], 'age': [2, 1], 'dr':[2, 4]})[['name', 'age']]
AZ_DF table:
name | age | dr |
---|---|---|
CR1 | 1 | 1 |
CR2 | 5 | 2 |
ZA_DF table:
name | age | dr |
---|---|---|
CR1 | 1 | 4 |
CR2 | 2 | 2 |
And I want to get the summary table of different values grouped by 'name' and 'age' columns between az_df and za_df, like:
name | only in AZ | only in ZA |
---|---|---|
CR2 | 5 | 2 |
So far, I did merged them,
merge = pd.merge(az_df, za_df, how='outer', indicator=True)
For az_df, different values are:
only_in_az = merge[merge['_merge'] == 'left_only']
And for za_df:
only_in_za = merge[merge['_merge'] == 'right_only']
However, I don't know how to build the summary table, I mentioned above, showing the different names' ages for az and za data frames.
Thank you.
CodePudding user response:
I think you can get what you want from the merge dataframe using pivot_table:
pd.pivot_table(merge.query('_merge != "both"'), values='age', index='name', columns='_merge').reset_index().rename_axis(None, axis=1).rename(columns={'left_only':'only in AZ','right_only':'only in ZA'})
name only in AZ only in ZA
0 CR2 5 2
CodePudding user response:
Try this:
import pandas as pd
az_df = pd.DataFrame({'name': ['CR1', 'CR2'], 'age': [1, 5], 'dr':[1, 2]})[['name', 'age']]
za_df = pd.DataFrame({'name': ['CR2', 'CR1'], 'age': [2, 1], 'dr':[2, 4]})[['name', 'age']]
merge = pd.merge(az_df, za_df, on='name', how='outer')
merge.rename(columns={'age_x': 'only in AZ', 'age_y': 'only in ZA'}, inplace=True)
merge
name only in AZ only in ZA
0 CR1 1 1
1 CR2 5 2
If you want to remove duplicates:
merge = merge[merge['only in AZ'] != merge['only in ZA']]
merge
name only in AZ only in ZA
1 CR2 5 2