I have two dataframes,
df1 df2
country country
US AR
US AD
CA AO
CN AU
AR US
How do I group by them by combining the country list to a set the compare the difference between two dataframes?
My expected output will be like,
country code df1_country_count df2_country_count
AR 1 1
AD 0 1
AO 0 1
AU 0 1
US 2 1
CA 1 0
CN 1 0
CodePudding user response:
(df1.value_counts().to_frame('df1_country_count')
.join(df2.value_counts().to_frame('df2_country_count'), how='outer')
.fillna(0).astype('int').rename_axis('country code'))
result:
df1_country_count df2_country_count
country code
AD 0 1
AO 0 1
AR 1 1
AU 0 1
CA 1 0
CN 1 0
US 2 1
CodePudding user response:
You can use value_counts
and then concat.
out = pd.concat([df1.country.value_counts(),
df2.country.value_counts()], axis=1).fillna(0).astype(int)
out.columns = ['df1_country', 'df2_country']
print(out)
df1_country df2_country
US 2 1
CA 1 0
CN 1 0
AR 1 1
AD 0 1
AO 0 1
AU 0 1