Home > Blockchain >  Plot lines in matplotlib based on discrete discriminator
Plot lines in matplotlib based on discrete discriminator

Time:11-18

I have 4 years of seasonal data I want to plot on one graph using matplotlib.
My data is in a pandas dataframe and looks like this:

Total Year Day
5  2017  10/29
4  2016  10/30
3  2018  10/31
5  2019  10/31
10 2017  10/31

The 'Year' and 'Day' columns are type str. The 'Total' column is type int.
I want the graph to have 4 lines: one line for each year. I want 'Total' on the Y axis and 'Day' on the X axis. I know how to do this in R using GGplot, but I can't figure it out using MatPlotLib in Python.

CodePudding user response:

First of all, you have to create a 'Date' column:

df['Month'] = df['Day'].apply(lambda x: x.split('/')[0])
df['Day'] = df['Day'].apply(lambda x: x.split('/')[1])
df['Date'] = '2020'   '-'   df['Month']   '-'   df['Day']
df['Date'] = pd.to_datetime(df['Date'])

Notice one important fact: you want different year to be plotted along the same x axis. In order to achieve this result, I need to report data from different years on the same reference year. As reference year I chose 2020 since it is a leap year (so it contains all possible dates, Feb 29 included). So the 'Date' column will contain the exact day and month, but 2020 as year, for each dataframe row.
This is required only for plotting purpose. Then you can mask the year value along x axis with proper formatting x ticks.
Finally you can loop over year and plot your data:

fig, ax = plt.subplots()

for year in df['Year'].unique():
    filt = df['Year'] == year
    ax.plot(df[filt]['Date'], df[filt]['Total'], label = year)

ax.xaxis.set_major_locator(md.DayLocator(interval = 15))
ax.xaxis.set_major_formatter(md.DateFormatter('%m/%d'))
plt.setp(ax.xaxis.get_majorticklabels(), rotation = 90)

ax.legend(frameon = True)
ax.set_xlabel('Date')
ax.set_ylabel('Total')

plt.show()

Complete code

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md


df = pd.read_csv(r'data\data.csv')

df['Month'] = df['Day'].apply(lambda x: x.split('/')[0])
df['Day'] = df['Day'].apply(lambda x: x.split('/')[1])
df['Date'] = '2020'   '-'   df['Month']   '-'   df['Day']
df['Date'] = pd.to_datetime(df['Date'])


fig, ax = plt.subplots()

for year in df['Year'].unique():
    filt = df['Year'] == year
    ax.plot(df[filt]['Date'], df[filt]['Total'], label = year)

ax.xaxis.set_major_locator(md.DayLocator(interval = 15))
ax.xaxis.set_major_formatter(md.DateFormatter('%m/%d'))
plt.setp(ax.xaxis.get_majorticklabels(), rotation = 90)

ax.legend(frameon = True)
ax.set_xlabel('Date')
ax.set_ylabel('Total')

plt.show()

enter image description here

  • Related