i want to have a graphic with x axis for TOT Population, y axis for Years and two lines one for Hispanic and one for not Hispanic. The dataframe is that:
ID Race ID Ethnicity ID Year Hispanic Population Moe
Ethnicity TOT Population
Hispanic or Latino 9825 4.0 1.0 2013.0 2345.0
12234 4.0 1.0 2014.0 2660.0
12437 4.0 1.0 2018.0 2429.0
13502 4.0 1.0 2016.0 3254.0
14025 4.0 1.0 2019.0 2644.0
... ... ... ... ... ...
Not Hispanic or Latino
14616636 0.0 0.0 2017.0 7788.0
14725729 0.0 0.0 2016.0 8629.0
14815122 0.0 0.0 2015.0 7888.0
14849129 0.0 0.0 2014.0 7495.0
14884539 0.0 0.0 2013.0 6586.0
I got this dataframe from a groupby of Ethnicity and TOT Poplation. SOmeone can help me to make real a matplotlib? Thank you!
CodePudding user response:
I believe there are two parts to your question. First is to move the grouped data to a format that maatplotlib
would understand (basically flatten the table) and second to plot (lines) the two lines in one graph.
The initial data:
>> df
ID Race ID Ethnicity ID Year Hispanic...
Ethnicity TOT Population
Hispanic or Latino 9825 4 1 2013 2345
12234 4 1 2014 2660
12437 4 1 2018 2429
13502 4 1 2016 3254
14025 4 1 2019 2644
Not Hispanic or Latino 14616636 0 0 2017 7788
14725729 0 0 2016 8629
14815122 0 0 2015 7888
14849129 0 0 2014 7495
14884539 0 0 2013 6586
First, use reset_index to flatten the table
>> df2 = df.reset_index()
>> df2
Ethnicity TOT Population ID Race ID Ethnicity ID Year Hispanic Population Moe
0 Hispanic or Latino 9825 4 1 2013 2345
1 Hispanic or Latino 12234 4 1 2014 2660
2 Hispanic or Latino 12437 4 1 2018 2429
3 Hispanic or Latino 13502 4 1 2016 3254
4 Hispanic or Latino 14025 4 1 2019 2644
5 Not Hispanic or Latino 14616636 0 0 2017 7788
6 Not Hispanic or Latino 14725729 0 0 2016 8629
7 Not Hispanic or Latino 14815122 0 0 2015 7888
8 Not Hispanic or Latino 14849129 0 0 2014 7495
9 Not Hispanic or Latino 14884539 0 0 2013 6586
You then plot the line graph.
import matplotlib.pyplot as plt
plt.figure(figsize=(20, 5))
plt.plot(df2['TOT Population'][df2['Ethnicity'] == 'Hispanic or Latino'], df2['ID Year'][df2['Ethnicity'] == 'Hispanic or Latino'])
plt.plot(df2['TOT Population'][df2['Ethnicity'] == 'Not Hispanic or Latino'], df2['ID Year'][df2['Ethnicity'] == 'Not Hispanic or Latino'], '-.')
plt.ticklabel_format(style='plain')
plt.xlabel("TOT Population")
plt.ylabel("Year")
plt.title('My plot')
Your graph will look like this. You can change it further as you need. Note that the Hispanic population is rather small compared to the non-hispanic population. So, the graph was made rather wide. You can plot just one group and see the ups and downs better.
Output graph