Home > Back-end >  How to create a new row in pandas dataframe by dividing values in a specific column between two rows
How to create a new row in pandas dataframe by dividing values in a specific column between two rows

Time:12-16

I have a pandas dataframe df which looks like follows:

Country Continent   Capital City    Year    Indicator   Value   Unit
0   Nepal   Asia    Kathmandu   2015    Population  3   million
1   Nepal   Asia    Kathmandu   2020    Population  5   million
2   Germany Europe  Berlin  2015    Population  4   million
3   Germany Europe  Berlin  2020    Population  6   million

df.to_dict() is as shown below:

{'Country': {0: 'Nepal', 1: 'Nepal', 2: 'Germany', 3: 'Germany'},
 'Continent': {0: 'Asia', 1: 'Asia', 2: 'Europe', 3: 'Europe'},
 'Capital City': {0: 'Kathmandu', 1: 'Kathmandu', 2: 'Berlin', 3: 'Berlin'},
 'Year': {0: 2015, 1: 2020, 2: 2015, 3: 2020},
 'Indicator': {0: 'Population',
  1: 'Population',
  2: 'Population',
  3: 'Population'},
 'Value': {0: 3, 1: 5, 2: 4, 3: 6},
 'Unit': {0: 'million', 1: 'million', 2: 'million', 3: 'million'}}

The dataframe consists of population data of capital city of two countries, Nepal and Germany, for 2015 and 2020 respectively.

I want to create two new rows, which show the growth rate of population between 2015 and 2020 (for e.g. 5/3 i.e. 1.67 for Nepal and 6/4 i.e. 1.5 for Germany). The rows need to be in the same dataframe. In the new rows, the columns Country, Continent and Capital City should remain same for the respective country. The year value stays 2020, the indicator name needs to be "Population growth rate" and the Unit needs to be "times 2015 value". It should look like below:

Country Continent   Capital City    Year    Indicator   Value   Unit
0   Nepal   Asia    Kathmandu   2015    Population  3   million
1   Nepal   Asia    Kathmandu   2020    Population  5   million
2   Germany Europe  Berlin  2015    Population  4   million
3   Germany Europe  Berlin  2020    Population  6   million
4   Nepal   Asia    Kathmandu   2020    Population growth rate  1.666667    times 2015 value
5   Germany Europe  Berlin  2020    Population growth rate  1.5 times 2015 value

How can I create these two new rows with the population growth rate being appended in the original dataframe?

CodePudding user response:

Do it with groupby then append

out = df.groupby(['Country','Continent','Capital City']).agg({'Year':'last','Value':lambda x : x.iloc[-1]/x.iloc[0]}).reset_index()
out['Indicator'] = 'Population growth rate'
df = df.append(out)
df
Out[16]: 
   Country Continent Capital City  ...               Indicator     Value     Unit
0    Nepal      Asia    Kathmandu  ...              Population  3.000000  million
1    Nepal      Asia    Kathmandu  ...              Population  5.000000  million
2  Germany    Europe       Berlin  ...              Population  4.000000  million
3  Germany    Europe       Berlin  ...              Population  6.000000  million
0  Germany    Europe       Berlin  ...  Population growth rate  1.500000      NaN
1    Nepal      Asia    Kathmandu  ...  Population growth rate  1.666667      NaN
[6 rows x 7 columns]
  • Related