Home > database >  Merging different sized data frames and plotting the difference of a column
Merging different sized data frames and plotting the difference of a column

Time:10-04

i have two dataframes Region_education_0 and Region_education_1 Region_education_0

index Region ConvertedComp
1 Australia/New Zealand 122573.834171
2 Caribbean 53562.111111
3 Central Asia 134422.000000
4 East Asia 112492.507042
5 Melanesia 605

Region_education_1

index Region ConvertedComp
1 Australia/New Zealand 122573.834171
2 Caribbean 53562.111111
3 Central Asia 134422.000000
4 East Asia 112492.507042

Index 5, Melanesia is not present in Region_education_1 because of a condition, i want to compare them and plot so i tried this

from matplotlib.pyplot import *

Region_education_combined=Region_education_0.merge(Region_education_1,left_on="Region",right_on="Region")
Region_education_combined.columns=["Region","Max of Bachelors Higher Ed","Higher Formal Education"]
Region_education_combined['Diff_HigherEd_Vals'] = Region_education_combined['Higher Formal Education'] - Region_education_combined['Max of Bachelors Higher Ed']
print(Region_education_combined)
comp_df.style.bar(subset=['Diff_HigherEd_Vals'], align='mid', color=['#d65f5f', '#5fba7d'])

index Max of Bachelors Higher Ed Higher Formal Education Diff_HigherEd_Vals
1 151698.500659 122573.834171 -29124.666488
2 28413.753425 53562.111111 53562.111111
3 3944.750000 5883.000000 1938.250000
4 45091.041667 27052.384615 -18038.657051

Region column is missing from the output,to include I region tried


comp_df.style.bar(subset=['Diff_HigherEd_Vals','Region'], align='mid', color=['#d65f5f', '#5fba7d'])

and


comp_df.style.bar(Region_education_combined, align='mid', color=['#d65f5f', '#5fba7d'])

Is there any way to include region in the final output? and i left out "Index 5, Melanesia" from 'Region_education_0' dataframe is there any way to include that too in the output ?

CodePudding user response:

You can maintain the missing Region by using how="outer" when you call merge like this

Region_education_combined=Region_education_0.merge(Region_education_1,left_on="Region",right_on="Region")

Pay attention that in this case you will have a table which contains Nan where it is not possible to merge, in your case Melanesia will have a Nan in the Higher Formal Education column. In order to avoid problem you can set a default value with this

Region_education_combined["Higher Formal Education"]fillna(0, inplace=True)
  • Related