Home > OS >  I have a table with 50 columns - 1 for each year. How can I reshape the table so that the data poin
I have a table with 50 columns - 1 for each year. How can I reshape the table so that the data poin

Time:06-18

I'm relatively new to python and having an issue that I cannot wrap my head around.

I have a collecting of economic indicators that I am looking to analyse for correlations vs certain events. I am trying to format the data so that I can analyse using matplotlib.

Currently the data is in a format like this (there are different tables for different metrics, eg, GDP growth:

Country Name 1950 1951 ... 2020 2021
Country 1 X0 X1 ... X70 X71
Country 2 Y0 Y1 ... Y70 Y71

I'm having trouble working out how to put this into a visualisation in a straightforward way. I would like to plot the following on a scatterplot:

  • Years along the x-axis
  • GDP growth along the y-axis
  • Country name as the different series

The simplest way I can think of doing this is reformatting the data so that all the GDP growth data is in one column, along with attributes for Country Name and Year. So the data would look something like this, so that each data point becomes a row:

Country Name Year GDP Growth
Country 1 1950 X0
Country 1 1951 X1
... ... ...
Country 2 2020 Y70
Country 2 2021 Y71

Any suggestions on an efficient (and elegant?) way to do this in python? Any method I can think of ends up being a huge and manual unertaking as there are over 10,000 individual country/year data points.

CodePudding user response:

Melting it seems to work.

columns = ['Country Name',  '1950', '1951', '2020', '2021']
data = [['Country2',    'Y0','Y1','Y70','Y71'],
['Country1',    'X0','X1','X70','X71']]

df = pd.DataFrame(data,columns=columns)

>>> df
  Country Name 1950 1951 2020 2021
0     Country2   Y0   Y1  Y70  Y71
1     Country1   X0   X1  X70  X71
>>>
>>> df.melt(id_vars=['Country Name'])
  Country Name variable value
0     Country2     1950    Y0
1     Country1     1950    X0
2     Country2     1951    Y1
3     Country1     1951    X1
4     Country2     2020   Y70
5     Country1     2020   X70
6     Country2     2021   Y71
7     Country1     2021   X71
>>>
>>> dg = df.melt(id_vars=['Country Name'],var_name='Year', value_name='GDP Growth')
>>> dg
  Country Name     Year GDP Growthrowth
0     Country2     1950              Y0
1     Country1     1950              X0
2     Country2     1951              Y1
3     Country1     1951              X1
4     Country2     2020             Y70
5     Country1     2020             X70
6     Country2     2021             Y71
7     Country1     2021             X71
  • Related