The Problem:
I have a dataframe df
that looks like this:
value msg_type
date
2022-03-15 08:15:10 00:00 122 None
2022-03-15 08:25:10 00:00 125 None
2022-03-15 08:30:10 00:00 126 None
2022-03-15 08:30:26.542134 00:00 127 ANNOUNCEMENT
2022-03-15 08:35:10 00:00 128 None
2022-03-15 08:40:10 00:00 122 None
2022-03-15 08:45:09 00:00 127 None
2022-03-15 08:50:09 00:00 133 None
2022-03-15 08:55:09 00:00 134 None
....
2022-03-16 09:30:09 00:00 132 None
2022-03-16 09:30:13.234425 00:00 135 ANNOUNCEMENT
2022-03-16 09:35:09 00:00 130 None
2022-03-16 09:40:09 00:00 134 None
2022-03-16 09:45:09 00:00 135 None
2022-03-16 09:50:09 00:00 134 None
The value
data occurs in roughly 5 minute intervals, but messages can occur at any time. I am trying to plot one line of values
per day, where the x-axis ranges from t=-2 hours to t= 8 hours, and the ANNOUNCEMENT
occurs at t=0 (see image below).
So, for example, if an ANNOUNCEMENT
occurs at 8:30AM on 3/15 and again at 9:30AM on 3/16, there should be two lines:
- one line for 3/15 that plots data from 6:30AM to 4:30PM, and
- one line for 3/16 that plots data from 7:30AM to 5:30PM,
both sharing the same x-axis ranging from -2 to 8, with ANNOUNCEMENT
at t=0.
What I've Tried:
I am able to do this currently by finding the index position of an announcement (e.g. say it occurs at row 298 -> announcement_index = 298
), generating an array of 120 numbers from -24 to 96 (representing 10 hours at 5 minutes per number -> x = np.arange(-24, 96, 1)
), then plotting
sns.lineplot(x, y=df['value'].iloc[announcement_index-24:announcement_index 96])
While this does mostly work (see image below), I suspect it's not the correct way to go about it. Specifically, trying to add more info to the plot (like a different set of 'value' markers) at specific times is difficult because I would need to convert the timestamp into this arbitrary 24-96 value range.
How can I make this same plot but by utilizing the datetime index instead? Thank you so much!
CodePudding user response:
Assuming the index has already been converted
Full code:
import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
s = '''
date,value,msg_type
2022-03-15 08:15:10 00:00,122,None
2022-03-15 08:25:10 00:00,125,None
2022-03-15 08:30:10 00:00,126,None
2022-03-15 08:30:26.542134 00:00,127,ANNOUNCEMENT
2022-03-15 08:35:10 00:00,128,None
2022-03-15 08:40:10 00:00,122,None
2022-03-15 08:45:09 00:00,127,None
2022-03-15 08:50:09 00:00,133,None
2022-03-15 08:55:09 00:00,134,None
2022-03-16 09:30:09 00:00,132,None
2022-03-16 09:30:13.234425 00:00,135,ANNOUNCEMENT
2022-03-16 09:35:09 00:00,130,None
2022-03-16 09:40:09 00:00,134,None
2022-03-16 09:45:09 00:00,135,None
2022-03-16 09:50:09 00:00,134,None
'''
df = pd.read_csv(io.StringIO(s), index_col=0, parse_dates=['date'])
# create intervals from -1H to 2H of the index
dl, dr = -1, 2
left = df.index pd.Timedelta(f'{dl}H')
right = df.index pd.Timedelta(f'{dr}H')
df['interval'] = pd.arrays.IntervalArray.from_arrays(left, right)
# plot each announcement's interval.left to interval.right
fig, ax = plt.subplots()
for ann in df.loc[df['msg_type'] == 'ANNOUNCEMENT')].itertuples():
window = df.loc[ann.interval.left:ann.interval.right] # extract interval.left to interval.right
window.index -= ann.Index # compute time since announcement
window.index = window.index.total_seconds() # convert to seconds since announcement
window.plot(ax=ax, y='value', label=ann.Index.date())
deltas = np.arange(dl, dr 1)
ax.set(xticks=deltas * 3600, xticklabels=deltas) # set tick labels to hours since announcement
ax.grid()
ax.legend()