Home > Mobile >  Python plotting on/off data using Matplotlib
Python plotting on/off data using Matplotlib

Time:03-17

I'm trying to plot data about a bunch of devices whether they're online or offline. The devices give a signal 1 when they come online and a signal 0 when they're going offline. In between, there's no data.

For just one device I use a step plot (with step=post), which works pretty well. Now I want to show by a line when one or more devices are online.

Does anyone have any tips/tricks on how to visualize this dataset? I've tried adding extra rows just before each signal to get a more continuous dataset and then plot the value of OnOff, but then I lose the categories. Do I need to convert this to a broken_barh plot? Or any other ideas?

Example figure

Data:

import pandas as pd 
import matplotlib.pyplot as plt

TESTDATA = u"""\
Index;OnOff;Device
12-10-2021 10:04:04;1;device1
12-10-2021 10:04:12;0;device3
12-10-2021 10:05:05;1;device2
12-10-2021 19:05:11;0;device2
13-10-2021 05:25:17;1;device2
13-10-2021 19:26:22;0;device2
14-10-2021 15:44:44;1;device2
14-10-2021 20:54:12;0;device2
15-10-2021 04:21:42;1;device2
15-10-2021 09:15:11;0;device2
15-10-2021 17:05:05;0;device1
15-10-2021 17:05:25;1;device3
15-10-2021 17:56:45;1;device1
15-10-2021 17:57:09;1;device2
15-10-2021 21:10:20;0;device2
16-10-2021 01:51:50;1;device2
19-10-2021 10:00:13;0;device1
19-10-2021 10:04:19;0;device2
"""

df = pd.read_csv(StringIO(TESTDATA), index_col=0, sep=';', engine='python')
df.index = pd.to_datetime(df.index, format='%d-%m-%Y %H:%M:%S')
print(df)

# plot
fig, ax = plt.subplots(figsize=[16,9])

devices = list(set(df['Device']))
devices.sort(reverse=True)

for device in devices:
    ax.plot(df.index[df['Device'] == device], df['Device'][df['Device'] == device], label=device)
plt.show()

CodePudding user response:

The problem is in the ax.plot params. ax.plot requires x and y, e.g. ax.plot(x, y) your x, y are: x - df.index[df['Device'] == device] - this is correct y - df['Device'][df['Device'] == device - this is not correct

change df['Device'][df['Device'] == device to df.loc[df['Device'] == device, 'OnOff']

df.loc works by filtering rows and then columns:

df.loc[row_filter, column_filter]
row_filter = df['Device'] == device # give me all rows whre 'Device' column's value == device variable value
column_filter = 'OnOff' # give me just the OnOff column

The graph you will see may not be what you want. enter image description here

You may want to replace the ax.plot with ax.step to see the below, but the data will overlap and won't be too redable: enter image description here

The final solution may be to draw 3 axes, 1 for each device on shared x axis:

# plot
fig, axs = plt.subplots(3,1, figsize=[16,9], sharex=True)

devices = list(set(df['Device']))
devices.sort(reverse=True)

for device_idx, device in enumerate(devices):
    axs[device_idx].step(df.index[df['Device'] == device], df.loc[df['Device'] == device, 'OnOff'] , label=device )   

enter image description here

CodePudding user response:

Datetime objects are indeed difficult in their behavior as not all pandas/numpy/matplotlib functions accept all versions or might interpret them differently. However, we can convert enter image description here

Most of the code is just necessary to take care of cases where the status at the beginning or end is not explicitly declared by the original dataframe. That the status is 1 and 0, however, makes the coding easier as it can be directly translated into indexes.

P.S. The first bar of device 3 is visible in the original plot but not in the downsampled image stored here on SO.

CodePudding user response:

# first I would assume that all devices have to start from the unknow state, instead of assuming they are off, 
# thus lets add one row at the begining
new_index_first_element = df.index[0]-pd.Timedelta(seconds=1)
new_index = [new_index_first_element]   df.index.to_list()

devices = sorted(df.Device.unique())

# lets create a new dataframe where each device will have its own column and
# each entry will track the state of each device
df2 = pd.DataFrame(index = new_index, columns=devices) 

for i_iloc in range(1,len(df2)): # i have to be able to reffer to previous row, thus I will go with iloc, instead of loc
    # first copy previous status of all devices to current row
    df2.iloc[i_iloc] = df2.iloc[i_iloc-1]
    
    # now lets update the status for device that changed
    current_row_idx = df2.iloc[[i_iloc]].index
    device_to_update = df.loc[current_row_idx, 'Device']
    status_to_update = df.loc[current_row_idx, 'OnOff']
    df2.at[current_row_idx, device_to_update] = status_to_update

df2

enter image description here

This is how the DF will look like, it has an additional row with NaNs as we do not know what the status of those devices are.

# and plot
fig, ax = plt.subplots(figsize=[16,9])
df2.plot(kind='bar', stacked=True, color=['red', 'skyblue', 'green'], ax=ax)

enter image description here

I dont think that plotting a 'broken_barh plot' will do a good job here, this stacked barplot will be way better.

  • Related