I have a pandas dataframe df
which looks like follows:
Country Continent Capital City Year Indicator Value Unit
0 Nepal Asia Kathmandu 2015 Population 3 million
1 Nepal Asia Kathmandu 2020 Population 5 million
2 Germany Europe Berlin 2015 Population 4 million
3 Germany Europe Berlin 2020 Population 6 million
df.to_dict()
is as shown below:
{'Country': {0: 'Nepal', 1: 'Nepal', 2: 'Germany', 3: 'Germany'},
'Continent': {0: 'Asia', 1: 'Asia', 2: 'Europe', 3: 'Europe'},
'Capital City': {0: 'Kathmandu', 1: 'Kathmandu', 2: 'Berlin', 3: 'Berlin'},
'Year': {0: 2015, 1: 2020, 2: 2015, 3: 2020},
'Indicator': {0: 'Population',
1: 'Population',
2: 'Population',
3: 'Population'},
'Value': {0: 3, 1: 5, 2: 4, 3: 6},
'Unit': {0: 'million', 1: 'million', 2: 'million', 3: 'million'}}
The dataframe consists of population data of capital city of two countries, Nepal and Germany, for 2015 and 2020 respectively.
I want to create two new rows, which show the growth rate of population between 2015 and 2020 (for e.g. 5/3 i.e. 1.67 for Nepal and 6/4 i.e. 1.5 for Germany). The rows need to be in the same dataframe. In the new rows, the columns Country, Continent and Capital City should remain same for the respective country. The year value stays 2020, the indicator name needs to be "Population growth rate" and the Unit needs to be "times 2015 value". It should look like below:
Country Continent Capital City Year Indicator Value Unit
0 Nepal Asia Kathmandu 2015 Population 3 million
1 Nepal Asia Kathmandu 2020 Population 5 million
2 Germany Europe Berlin 2015 Population 4 million
3 Germany Europe Berlin 2020 Population 6 million
4 Nepal Asia Kathmandu 2020 Population growth rate 1.666667 times 2015 value
5 Germany Europe Berlin 2020 Population growth rate 1.5 times 2015 value
How can I create these two new rows with the population growth rate being appended in the original dataframe?
CodePudding user response:
Do it with groupby
then append
out = df.groupby(['Country','Continent','Capital City']).agg({'Year':'last','Value':lambda x : x.iloc[-1]/x.iloc[0]}).reset_index()
out['Indicator'] = 'Population growth rate'
df = df.append(out)
df
Out[16]:
Country Continent Capital City ... Indicator Value Unit
0 Nepal Asia Kathmandu ... Population 3.000000 million
1 Nepal Asia Kathmandu ... Population 5.000000 million
2 Germany Europe Berlin ... Population 4.000000 million
3 Germany Europe Berlin ... Population 6.000000 million
0 Germany Europe Berlin ... Population growth rate 1.500000 NaN
1 Nepal Asia Kathmandu ... Population growth rate 1.666667 NaN
[6 rows x 7 columns]