I'm currently dealing with a worldbank dataset and I would like to pivot it to analyse it better.
Currently, it's in a form like this :
df = pd.DataFrame({'Country Name' : ['A','A','A','B','B','B','C','C','C'],'Indicator' : ['X','Y','Z','X','Y','Z','X','Y','Z'],'2010' : [1,2,3,4,5,6,7,8,9],'2011' : [9,8,7,6,5,4,3,2,1]})
print(df)
Country Name Indicator 2010 2011
0 A X 1 9
1 A Y 2 8
2 A Z 3 7
3 B X 4 6
4 B Y 5 5
5 B Z 6 4
6 C X 7 3
7 C Y 8 2
8 C Z 9 1
I simplified it, there's more columns with more yearly value but that's good enough to explain. Now I would like to get that table to pivot, so that I could get the indicators in columns, with the countries in index and the yearly values for the values. However for that I would like to have a df like this :
year | indicator1 | indicator 2 | indicator3 | |
---|---|---|---|---|
Country A | 2010 | values | . | . |
2011 | . | . | . | |
Country B | 2010 | . | . | . |
2011 | . | . | . | |
Country C | 2010 | . | . | . |
But since the values are stored in the year columns, I don't know how to transform it to get to this disposition
I tried doing it like this :
indicators = data.pivot_table(index='Country Name',columns='Indicator Name', values='2011', dropna=False)
This ofc works, but I just get the values for the year 2011 and I don't really want to create a dataframe for each year.
But I can't add the name of the year columns in the index as it creates an enormous dataframe for some reason, and if i just add them in the values it will only create more columns, with some kind of index on the columns to separate the years.
I don't know if there's a method to do what I'm looking for but I appreciate any help !
Thanks
CodePudding user response:
melt
before pivot_table
:
(df.melt(['Country Name', 'Indicator'], var_name='Year')
.pivot_table(index=['Country Name', 'Year'],
columns='Indicator', values='value')
)
Or reshape with stack
/unstack
(requires unique values, unlike pivot_table
):
(df.set_index(['Country Name', 'Indicator'])
.rename_axis(columns='Year').stack().unstack('Indicator')
)
output:
Indicator X Y Z
Country Name Year
A 2010 1 2 3
2011 9 8 7
B 2010 4 5 6
2011 6 5 4
C 2010 7 8 9
2011 3 2 1