Background: My dataset aquires values at roughly 5 minute intervals, but sometimes there are gaps. I am charting my dataset using Plotly and attempting to resolve an issue where a straight line is drawn between points if there is a gap in the dataset. Plotly has a parameter connectgaps
which if set to false will not connect over 'nan' values. However, my dataset looks like this:
(where I have computed the time difference using df['time_diff_mins'] = (df['datetime'].shift(-1) - df['datetime']).dt.total_seconds() / 60
)
datetime value time_diff_mins
0 2022-03-09 09:25:00 98 5
1 2022-03-09 09:30:00 104 21
2 2022-03-09 09:51:00 105 3
3 2022-03-09 09:54:00 110 nan
If you look at rows 1 and 2, the time difference is 21 minutes. For this reason, I don't want the values 104 and 105 to be connected - I want a break in the line if there is a gap of greater than 15 mins and 15 seconds.
So, I am trying to insert a new row with null/nan values in my dataframe if the time difference between rows is greater than 15 mins and 15 seconds, so that Plotly will not connect the gaps.
Desired output:
datetime value
0 2022-03-09 09:25:00 98
1 2022-03-09 09:30:00 104
2 2022-03-09 09:40:30 nan
3 2022-03-09 09:51:00 105
4 2022-03-09 09:54:00 110
I hope that makes sense. I know that inserting rows programmatically is probably not an optimal solution, so I haven't been able to find a good answer to this. Thank you in advance for your time and help.
CodePudding user response:
You can use a mask and pandas.concat
df['datetime'] = pd.to_datetime(df['datetime'])
delta = '15 min 15 s'
d = df['datetime'].diff().shift(-1)
out = (pd.concat([df,
df['datetime'].add(d/2).
.loc[d.gt(delta)].to_frame()
])
.sort_index()
)
Output:
datetime value time_diff_mins
0 2022-03-09 09:25:00 98.0 5.0
1 2022-03-09 09:30:00 104.0 21.0
1 2022-03-09 09:40:30 NaN NaN
2 2022-03-09 09:51:00 105.0 3.0
3 2022-03-09 09:54:00 110.0 NaN