I have 4 years of seasonal data I want to plot on one graph using matplotlib.
My data is in a pandas dataframe and looks like this:
Total Year Day
5 2017 10/29
4 2016 10/30
3 2018 10/31
5 2019 10/31
10 2017 10/31
The 'Year'
and 'Day'
columns are type str
. The 'Total'
column is type int
.
I want the graph to have 4 lines: one line for each year. I want 'Total'
on the Y axis and 'Day'
on the X axis. I know how to do this in R using GGplot, but I can't figure it out using MatPlotLib in Python.
CodePudding user response:
First of all, you have to create a 'Date'
column:
df['Month'] = df['Day'].apply(lambda x: x.split('/')[0])
df['Day'] = df['Day'].apply(lambda x: x.split('/')[1])
df['Date'] = '2020' '-' df['Month'] '-' df['Day']
df['Date'] = pd.to_datetime(df['Date'])
Notice one important fact: you want different year to be plotted along the same x axis. In order to achieve this result, I need to report data from different years on the same reference year. As reference year I chose 2020
since it is a leap year (so it contains all possible dates, Feb 29
included). So the 'Date'
column will contain the exact day and month, but 2020
as year, for each dataframe row.
This is required only for plotting purpose. Then you can mask the year value along x axis with proper formatting x ticks.
Finally you can loop over year and plot your data:
fig, ax = plt.subplots()
for year in df['Year'].unique():
filt = df['Year'] == year
ax.plot(df[filt]['Date'], df[filt]['Total'], label = year)
ax.xaxis.set_major_locator(md.DayLocator(interval = 15))
ax.xaxis.set_major_formatter(md.DateFormatter('%m/%d'))
plt.setp(ax.xaxis.get_majorticklabels(), rotation = 90)
ax.legend(frameon = True)
ax.set_xlabel('Date')
ax.set_ylabel('Total')
plt.show()
Complete code
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
df = pd.read_csv(r'data\data.csv')
df['Month'] = df['Day'].apply(lambda x: x.split('/')[0])
df['Day'] = df['Day'].apply(lambda x: x.split('/')[1])
df['Date'] = '2020' '-' df['Month'] '-' df['Day']
df['Date'] = pd.to_datetime(df['Date'])
fig, ax = plt.subplots()
for year in df['Year'].unique():
filt = df['Year'] == year
ax.plot(df[filt]['Date'], df[filt]['Total'], label = year)
ax.xaxis.set_major_locator(md.DayLocator(interval = 15))
ax.xaxis.set_major_formatter(md.DateFormatter('%m/%d'))
plt.setp(ax.xaxis.get_majorticklabels(), rotation = 90)
ax.legend(frameon = True)
ax.set_xlabel('Date')
ax.set_ylabel('Total')
plt.show()