Home > Software engineering >  Plotting time series box and whisker plot with missing date values for origin destination pairs
Plotting time series box and whisker plot with missing date values for origin destination pairs

Time:02-19

I have the following data set:

df.head(7)
     Origin        Dest     Date            Quantity
0   Atlanta        LA       2021-09-09      1
1   Atlanta        LA       2021-09-11      4
2   Atlanta        Chicago  2021-09-16      1
3   Atlanta        Seattle  2021-09-27      12
4   Seattle        LA       2021-09-29      2
5   Seattle        Atlanta  2021-09-13      2
6   Seattle        Newark   2021-09-17      7

In short, this table represents the number of items (Quantity) that were sent from a given origin to a given destination on a given date. The table contains 1 month of data. This table was read with:

shipments = pd.read_csv('shipments.csv', parse_dates=['Date'])

Note that this is a sparse table: if Quantity=0 for a particular (Origin,Dest,Date) pair then this row is not included in the table. As per example, on 2021-09-10 no items were sent from Atlanta to LA this row is not included in the data.

I would like to visualize this data using time series box and whisker plots. The x-axis of my graph should show the day, and Quantity should be on the y-axis. A boxplot should represent the various percentiles aggregated over all (origin-destination) pairs.

Similarly, would it be possible to create a graph which, instead of every day, only shows Monday-Sunday on the x-axis (and hence shows the results per day of the week)?

To generate the rows with missing data I used the following code:

table = pd.pivot_table(data=shipments, index='Date', columns=['Origin','Dest'], values='Quantity', fill_value=0)
idx = pd.date_range('2021-09-06','2021-10-10')
table = table.reindex(idx,fill_value=0)

CodePudding user response:

You could transpose the table dataframe, and use that as input for a sns.boxplot. And you could create a similar table for the day of the week. Note that with many zeros, the boxplot might look a bit strange.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# first create some test data, somewhat similar to the given data
N = 1000
cities = ['Atlanta', 'LA', 'Chicago', 'Seattle', 'Newark']
shipments = pd.DataFrame({'Origin': np.random.choice(cities, N),
                          'Dest': np.random.choice(cities, N),
                          'Date': np.random.choice(pd.date_range('2021-09-06', '2021-10-10'), N),
                          'Quantity': (np.random.uniform(1, 4, N) ** 3).astype(int)})

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 5), gridspec_kw={'width_ratios': [3, 1]})

# create boxplots for each day
table_month = pd.pivot_table(data=shipments, index='Date', columns=['Origin', 'Dest'], values='Quantity', fill_value=0)
idx = pd.date_range('2021-09-06', '2021-10-10')
table_month = table_month.reindex(idx, fill_value=0)
sns.boxplot(data=table_month.T, ax=ax1)
labels = [day.strftime('%d\n%b %Y') if i == 0 or day.day == 1 else day.strftime('%d')
          for i, day in enumerate(table_month.index)]
ax1.set_xticklabels(labels)

# create boxplots for each day of the week
table_dow = pd.pivot_table(data=shipments, index=shipments['Date'].dt.dayofweek, columns=['Origin', 'Dest'],
                           values='Quantity', fill_value=0)
table_dow = table_dow.reindex(range(7), fill_value=0)
sns.boxplot(data=table_dow.T, ax=ax2)
labels = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']
ax2.set_xticklabels(labels)
ax2.set_xlabel('')  # remove superfluous x label

fig.tight_layout()
plt.show()

boxplot for each day

  • Related