Plotly with datetime.time() in the x-axis and missing values-CodePudding

I have 2 pandas dataframes, df1 and df2 which both have data from 2 different days between 21:00 and 8:00. The data should be 1 data point per minute, however there are there are missing values e.g.

       location        time             Data
0          1         21:00:00            8
1          1         21:02:00            6

the data point for 21:01:00 does not exist. The missing data points occur at different times for each of the dataframes, so when I try to plot both of them on the same plot this happens:

If I plot them individually they're both correct. I think the horizontal red lines are caused by the time values that exist in the red dataframe but not in the blue dataframe.

Has anyone encountered this before? I want to plot both of them on the same axis, starting at 21:00 and finishing at 08:00.

Here is the code I'm using:

import pandas as pd
import plotly.express as px

df1 = pd.DataFrame({'location': 1,
                    'data': ['3', '4', '5'], 
                   'time': [datetime.datetime(2022,7,16,21,0,0).time(), 
                            datetime.datetime(2022,7,16,21,1,0).time(), 
                            datetime.datetime(2022,7,16,21,3,0).time()]})
df2 = pd.DataFrame({'location': 2,
                    'data': ['8', '6', '7'], 
                   'time': [datetime.datetime(2022,7,17,21,0,0).time(), 
                            datetime.datetime(2022,7,17,21,2,0).time(), 
                            datetime.datetime(2022,7,17,21,3,0).time()]})

df = pd.concat([df1,df2], axis=0)

fig = px.line(df, x="time", y="data", color='location')
fig.show()

Thanks!

CodePudding user response：

The problem is with the time column. As you convert it to time(), this will be converted to object when you combine the dataframes. Check df.info(). To avoid this, leave the data in datetime format and use update_axis() to let px set the time. Updated code below...

import pandas as pd
import plotly.express as px

df1 = pd.DataFrame({'location': 1,
                    'data': ['3', '4', '5'], 
                   'time': [datetime.datetime(2022,7,16,21,0,0), 
                            datetime.datetime(2022,7,16,21,1,0), 
                            datetime.datetime(2022,7,16,21,3,0)]})
df2 = pd.DataFrame({'location': 2,
                    'data': ['8', '6', '7'], 
                   'time': [datetime.datetime(2022,7,16,21,0,0), 
                            datetime.datetime(2022,7,16,21,2,0), 
                            datetime.datetime(2022,7,16,21,3,0)]})

df = pd.concat([df1,df2], axis=0)

fig = px.line(df, x="time", y="data", color='location')
fig.update_xaxes(tickformat="%H:%M:%S")
fig.show()

Plot

EDIT

As per your updated requirement, you want to remove the date section and show ONLY time, without caring for the date. To do this, you can go back to taking only the time() in your dataframe creation. Post concatenation of the dfs, use a dummy date (2022-01-01 here) and create the datetime and plot. This will give you below graph.

## Note that you need to use .time()
df1 = pd.DataFrame({'location': 1, 'data': ['3', '4', '5'], 
                   'time': [datetime.datetime(2022,7,17,21,0,0).time(), 
                            datetime.datetime(2022,7,17,21,1,0).time(), 
                            datetime.datetime(2022,7,17,21,3,0).time()]})
df2 = pd.DataFrame({'location': 2, 'data': ['8', '6', '7'], 
                   'time': [datetime.datetime(2022,7,16,21,0,0).time(), 
                            datetime.datetime(2022,7,16,21,2,0).time(), 
                            datetime.datetime(2022,7,16,21,3,0).time()]})

df = pd.concat([df1,df2], axis=0)
date = str(datetime.datetime.strptime('2022-01-01', '%Y-%m-%d').date())  ##Random dummy date
df['time'] = pd.to_datetime(date   " "   df['time'].astype(str)) ##Convert back to datetime
fig = px.line(df, x="time", y="data", color='location')
fig.update_xaxes(tickformat="%H:%M")
fig.show()

Now, for the second part of your requirement - the graph needs to always show 9PM to 8AM. To do this, you will need to use the range_x with the start and end times. Replace above px.line() with these lines...

dt = datetime.datetime.strptime('2022-01-01', '%Y-%m-%d')
starttime = dt.replace(hour=21, minute=0) ## Start time is 9PM
dt = datetime.datetime.strptime('2022-01-02', '%Y-%m-%d')
endtime = dt.replace(hour=8, minute=0) ## End time is 8AM next day
fig = px.line(df, x="time", y="data", color='location', range_x=[starttime, endtime])

Please note that I have shown both plots to showcase that, if you take a longer period (9 to 8), the lines will just look like vertical lines. Hope you are okay with that.

CodePudding user response：

started by simulating data that has the features you describe. From 21:00 to 08:00 on different dates and with different randomly removed minutes
now integrate this data. Have taken approach
1. fill missing minutes by outer join to all minutes in each dataframe
2. outer join the two data frames on time only

This gives a different struct data frame:

	location_x	time_x	Data_x	t	location_y	time_y	Data_y
0	1	2022-09-01 21:00:00	0	21:00:00	2	2022-09-04 21:00:00	1
1	1	2022-09-01 21:01:00	0.0302984	21:01:00	2	2022-09-04 21:01:00	0.999541
2	1	2022-09-01 21:02:00	0.060569	21:02:00	2	2022-09-04 21:02:00	0.998164
3	1	2022-09-01 21:03:00	0.0907839	21:03:00	2	2022-09-04 21:03:00	0.995871
4	1	2022-09-01 21:04:00	0.120916	21:04:00	2	2022-09-04 21:04:00	nan

This is then simple to generate a px.line() figure from. Traces being Data_x and Data_y. Have used datetime column time_x for xaxis. This then works well as datetime and continuous axes are well integrated. Updated tickformat so date part of axis is not displayed.

import pandas as pd
import numpy as np
import plotly.express as px

dr = pd.date_range("2022-09-01 21:00", "2022-09-02 08:00", freq="1Min")

# data to match question, two dataframes from 21:00 to 08:00, different dates with some holes
# with different dates
dfs = [
    pd.DataFrame(
        {
            "location": np.full(len(dr), l),
            "time": dr   pd.DateOffset(days=o),
            "Data": f(np.linspace(0, 20, len(dr))),
        }
    )
    .sample(frac=0.95)
    .sort_index()
    for l, o, f in zip([1, 2], [0, 3], [np.sin, np.cos])
]


df1 = dfs[0]
df2 = dfs[1]

# let's integrate the dataframes
# 1. fill the holes in each dataframe by doing an outer join to all times
# 2. outer join the two dataframes on just the time
df = pd.merge(
    *[
        pd.merge(
            d,
            pd.DataFrame(
                {"time": pd.date_range(d["time"].min(), d["time"].max(), freq="1min")}
            ),
            on="time",
            how="outer",
        )
        .fillna({"location": l})
        .assign(t=lambda d: d["time"].dt.time)
        for d, l in zip([df1, df2], [1, 2])
    ],
    on="t",
    how="outer",
)


# finally generate plotly line chart using columns created by merging the data
# it's clearly observed there are gaps in both traces
px.line(
    df.sort_values("time_x"), x="time_x", y=["Data_x", "Data_y"], hover_data=["time_y"]
).update_layout({"xaxis": {"tickformat": "%H:%M"}})

output

CodePudding user response：

Thank you for your help @Redox it was very helpful but unfortunately doesn't work as I want it to when using the full datasets. This is the result for the equivalent of this:

## Note that you need to use .time()
df1 = pd.DataFrame({'location': 1, 'data': ['3', '4', '5'], 
                   'time': [datetime.datetime(2022,7,17,21,0,0).time(), 
                            datetime.datetime(2022,7,17,21,1,0).time(), 
                            datetime.datetime(2022,7,17,21,3,0).time()]})
df2 = pd.DataFrame({'location': 2, 'data': ['8', '6', '7'], 
                   'time': [datetime.datetime(2022,7,16,21,0,0).time(), 
                            datetime.datetime(2022,7,16,21,2,0).time(), 
                            datetime.datetime(2022,7,16,21,3,0).time()]})

df = pd.concat([df1,df2], axis=0)
date = str(datetime.datetime.strptime('2022-01-01', '%Y-%m-%d').date())  ##Random dummy date
df['time'] = pd.to_datetime(date   " "   df['time'].astype(str)) ##Convert back to datetime
fig = px.line(df, x="time", y="data", color='location')
fig.update_xaxes(tickformat="%H:%M")
fig.show()

When I try this:

dt = datetime.datetime.strptime('2022-01-01', '%Y-%m-%d')
starttime = dt.replace(hour=21, minute=0) ## Start time is 9PM
dt = datetime.datetime.strptime('2022-01-02', '%Y-%m-%d')
endtime = dt.replace(hour=8, minute=0) ## End time is 8AM next day
fig = px.line(df, x="time", y="data", color='location', range_x=[starttime, endtime])

This is the result:

CodePudding user response：

Here is what worked for me eventually:

df1 = pd.DataFrame({'location': 1, 'data': ['3', '4', '5'], 
                       'time_num': [datetime.datetime(2022,7,17,21,0,0).time().hour   datetime.datetime(2022,7,17,21,0,0).time().minute/60, 
                                datetime.datetime(2022,7,17,21,1,0).time().hour   datetime.datetime(2022,7,17,21,0,0).time().minute/60,  
                                datetime.datetime(2022,7,17,21,3,0).time().hour   datetime.datetime(2022,7,17,21,0,0).time().minute/60, ]})
df2 = pd.DataFrame({'location': 2, 'data': ['8', '6', '7'], 
                       'time_num': [datetime.datetime(2022,7,16,21,0,0).time().hour   datetime.datetime(2022,7,16,21,0,0).time().minute/60, 
                                datetime.datetime(2022,7,16,21,2,0).time().hour   datetime.datetime(2022,7,16,21,2,0).time().minute/60, 
                                datetime.datetime(2022,7,16,21,3,0).time().hour   datetime.datetime(2022,7,16,21,3,0).time().minute/60]})
    
df_skeleton = pd.DataFrame()
df_skeleton['date'] = pd.date_range(datetime.datetime(2022,7,16,20,0,0), datetime.datetime(2022,7,17,8,0,0), freq = '1min')
df_skeleton['time']=df_test['date'].dt.strftime('%H:%M:%S')
df_skeleton['hour']=df_test['date'].dt.strftime('%H')
df_skeleton['min']=df_test['date'].dt.strftime('%M')
df_skeleton[['hour', 'min']] = df_test[['hour', 'min']].astype(int)
df_skeleton['time_num'] = df_test['hour']   df_test['min']/60

result_1 = pd.merge(df_skeleton, df1, how="left", on=["time_num", "time_num"])
result_2 = pd.merge(df_skeleton, df2, how="left", on=["time_num", "time_num"])
result_1['location'] = '1'
fig = px.line(result_1, x='time', y='data',color='location')
fig.add_scatter(x=result_2['time'], y=result_2['data'],mode='lines', name='2')
fig.update_traces(connectgaps=True)
fig.show()

I'm not overly pleased with it but it works both with the dummy dataframes and the full dataframes.