Home > Blockchain >  Pandas: Repeating list in column does not work
Pandas: Repeating list in column does not work

Time:12-11

I want to turn a dataframe from thisenter image description here

to this:

enter image description here

It took me a while to figure out the melt and transpose function to get to this

enter image description here

But I did not get to manage to apply the years from 1990 to 2019 in a repeating manner into for every of the 189 countries.

I tried:

 year_list = []
for year in range(1990, 2020,1):
    year_list.append(year)
years = pd.Series(year_list)
years

and then

df['year'] = years.repeat(30)

(I need to repeat it 30 times, because the frame consists of 5670 rows = 189 countries * 29 years)

I got this error message:

ValueError: cannot reindex on an axis with duplicate labels

Googling this error does not help.

CodePudding user response:

One approach could be as follows:

Sample data

import pandas as pd
import numpy as np

data = {'country': ['Afghanistan','Angola']}

data.update({k: np.random.rand() for k in range(1990,1993)})

df = pd.DataFrame(data)

print(df)

       country      1990      1991      1992
0  Afghanistan  0.103589  0.950523  0.323925
1       Angola  0.103589  0.950523  0.323925

Code

res = (df.set_index('country')
       .unstack()
       .sort_index(level=1)
       .reset_index(drop=False)
       .rename(columns={'country': 'geo', 
                        'level_0': 'time', 
                        0: 'hdi_human_development_index'})
       )

print(res)

   time          geo  hdi_human_development_index
0  1990  Afghanistan                     0.103589
1  1991  Afghanistan                     0.950523
2  1992  Afghanistan                     0.323925
3  1990       Angola                     0.103589
4  1991       Angola                     0.950523
5  1992       Angola                     0.323925

Explanation

  • Use df.set_index on column country and apply df.unstack to add the years from the column names to the index.
  • Now, we use df.sort_index on level=1 to get the countries in alphabetical order.
  • Finally, we use df.reset_index with drop parameter set to False to get the index back as columns, and we chain df.rename to customize the column names.
  • Related