Home > Enterprise >  Which regions have the lowest literacy rates?
Which regions have the lowest literacy rates?

Time:01-27

I need to group/merge each city of the same name and calculate its overall percentage, to see which city amongst them has the lowest % literacy rate. Code:

Python
import pandas as pd
df = pd.DataFrame({'Cities': ["Cape Town", "Cape Town", "Cape Town", "Tokyo", "Cape Town", "Tokyo", "Mumbai", "Belgium", "Belgium" ],
               'LiteracyRate': [0.05, 0.35, 0.2, 0.11, 0.15, 0.2, 0.65, 0.35, 0.45]})

print(df)

For example:

   Cities      LiteracyRate
0  Cape Town   0.05
1  Cape Town   0.35
2  Cape Town   0.2
3  Tokyo       0.11
4  Cape Town   0.15
5  Tokyo       0.2
6  Mumbai      0.65
7  Belgium     0.35
8  Belgium     0.45

I'm expecting this:

   Cities     LiteracyRate   %LiteracyRate
0  Cape Town  0.75            75
1  Tokyo      0.31            31
2  Mumbai     0.65            65
3  Belgium    0.8             80

So I tried this code below but it's not giving me desirable results, the countries with similar names are still not merged. And the percentages ain't right.

# Calculate the percentage
df["%LiteracyRate"] = (df["LiteracyRate"]/df["LiteracyRate"].sum())*100

# Show the DataFrame
print(df)

CodePudding user response:

You can use groupby() in pandas, to join cities with the same names and sum() to calculate %

df = df.groupby('Cities').sum()

Than you can format results using

df['%LiteracyRate'] = (df['LiteracyRate']*100).round().astype(int)

df = df.reset_index()

To sort them by literacy rate you can

df = df.sort_values(by='%LiteracyRate')

df = df.reset_index()

Hope this helps!

  • Related