I'm relatively new to python and having an issue that I cannot wrap my head around.
I have a collecting of economic indicators that I am looking to analyse for correlations vs certain events. I am trying to format the data so that I can analyse using matplotlib.
Currently the data is in a format like this (there are different tables for different metrics, eg, GDP growth:
Country Name | 1950 | 1951 | ... | 2020 | 2021 |
---|---|---|---|---|---|
Country 1 | X0 | X1 | ... | X70 | X71 |
Country 2 | Y0 | Y1 | ... | Y70 | Y71 |
I'm having trouble working out how to put this into a visualisation in a straightforward way. I would like to plot the following on a scatterplot:
- Years along the x-axis
- GDP growth along the y-axis
- Country name as the different series
The simplest way I can think of doing this is reformatting the data so that all the GDP growth data is in one column, along with attributes for Country Name and Year. So the data would look something like this, so that each data point becomes a row:
Country Name | Year | GDP Growth |
---|---|---|
Country 1 | 1950 | X0 |
Country 1 | 1951 | X1 |
... | ... | ... |
Country 2 | 2020 | Y70 |
Country 2 | 2021 | Y71 |
Any suggestions on an efficient (and elegant?) way to do this in python? Any method I can think of ends up being a huge and manual unertaking as there are over 10,000 individual country/year data points.
CodePudding user response:
Melting it seems to work.
columns = ['Country Name', '1950', '1951', '2020', '2021']
data = [['Country2', 'Y0','Y1','Y70','Y71'],
['Country1', 'X0','X1','X70','X71']]
df = pd.DataFrame(data,columns=columns)
>>> df
Country Name 1950 1951 2020 2021
0 Country2 Y0 Y1 Y70 Y71
1 Country1 X0 X1 X70 X71
>>>
>>> df.melt(id_vars=['Country Name'])
Country Name variable value
0 Country2 1950 Y0
1 Country1 1950 X0
2 Country2 1951 Y1
3 Country1 1951 X1
4 Country2 2020 Y70
5 Country1 2020 X70
6 Country2 2021 Y71
7 Country1 2021 X71
>>>
>>> dg = df.melt(id_vars=['Country Name'],var_name='Year', value_name='GDP Growth')
>>> dg
Country Name Year GDP Growthrowth
0 Country2 1950 Y0
1 Country1 1950 X0
2 Country2 1951 Y1
3 Country1 1951 X1
4 Country2 2020 Y70
5 Country1 2020 X70
6 Country2 2021 Y71
7 Country1 2021 X71