Home > Back-end >  Create a pivot table with yearly index for each value
Create a pivot table with yearly index for each value

Time:10-28

I'm currently dealing with a worldbank dataset and I would like to pivot it to analyse it better.

Currently, it's in a form like this :

df = pd.DataFrame({'Country Name' : ['A','A','A','B','B','B','C','C','C'],'Indicator' : ['X','Y','Z','X','Y','Z','X','Y','Z'],'2010' : [1,2,3,4,5,6,7,8,9],'2011' : [9,8,7,6,5,4,3,2,1]}) 
print(df)



Country Name    Indicator   2010    2011
0   A               X         1       9
1   A               Y         2       8
2   A               Z         3       7
3   B               X         4       6
4   B               Y         5       5
5   B               Z         6       4
6   C               X         7       3
7   C               Y         8       2
8   C               Z         9       1

I simplified it, there's more columns with more yearly value but that's good enough to explain. Now I would like to get that table to pivot, so that I could get the indicators in columns, with the countries in index and the yearly values for the values. However for that I would like to have a df like this :

year indicator1 indicator 2 indicator3
Country A 2010 values . .
2011 . . .
Country B 2010 . . .
2011 . . .
Country C 2010 . . .

But since the values are stored in the year columns, I don't know how to transform it to get to this disposition

I tried doing it like this :

indicators = data.pivot_table(index='Country Name',columns='Indicator Name', values='2011', dropna=False)

This ofc works, but I just get the values for the year 2011 and I don't really want to create a dataframe for each year.

But I can't add the name of the year columns in the index as it creates an enormous dataframe for some reason, and if i just add them in the values it will only create more columns, with some kind of index on the columns to separate the years.

I don't know if there's a method to do what I'm looking for but I appreciate any help !

Thanks

CodePudding user response:

melt before pivot_table:

(df.melt(['Country Name', 'Indicator'], var_name='Year')
   .pivot_table(index=['Country Name', 'Year'],
                columns='Indicator', values='value')
)

Or reshape with stack/unstack (requires unique values, unlike pivot_table):

(df.set_index(['Country Name', 'Indicator'])
   .rename_axis(columns='Year').stack().unstack('Indicator')
)

output:

Indicator          X  Y  Z
Country Name Year         
A            2010  1  2  3
             2011  9  8  7
B            2010  4  5  6
             2011  6  5  4
C            2010  7  8  9
             2011  3  2  1
  • Related