How to obtain all gaps as start .. stop interval in pandas datetime index-CodePudding

I want to find all gaps in pandas DateTime index as a list of intervals. For example:

 '2022-05-06 00:01:00'
 '2022-05-06 00:02:00' <- Start of gap
 '2022-05-06 00:06:00' <- End of gap
 '2022-05-06 00:07:00'
 '2022-05-06 00:08:00'
 '2022-05-06 00:09:00' <- Next gap start
 '2022-05-06 05:00:00' <- End
 '2022-05-06 05:01:00'

And I whant to get next:

[('2022-05-06 00:03:00', '2022-05-06 00:05:00') , 
 ('2022-05-06 00:10:00', '2022-05-06 04:59:00')]

The frequency could be any, but the same for all index.

CodePudding user response：

I have an iterative code, but looking more effective:

import panas as pd
from datetime import timedelta

timestamp = pd.to_datetime(df.reset_index()['timestamp']).to_frame()
timestamp = timestamp.sort_values('timestamp')

l = [] 
freq = timedelta(minutest=10)
for i, g in timestamp.groupby(timestamp.index // 2):
    if g.iloc[0][0]   freq != g.iloc[1][0]:
        l.append([g.iloc[0][0] freq, g.iloc[1][0]-freq])
print(l)

CodePudding user response：

IIUC you can calculate the diff the identify the gaps. Use a mask to slice the starts and stops, and zip them as list.

# ensure datetime
df['datetime'] = pd.to_datetime(df['datetime'])

# threshold
t = pd.Timedelta('1min')
mask = df['datetime'].diff().gt(t)

# get values
starts = df.loc[mask.shift(-1, fill_value=False), 'datetime'].add(t).astype(str)
stops = df.loc[mask, 'datetime'].sub(t).astype(str)

# build output
out = list(zip(starts, stops))

Output:

[('2022-05-06 00:03:00', '2022-05-06 00:05:00'),
 ('2022-05-06 00:10:00', '2022-05-06 04:59:00')]

Used input:

              datetime
0  2022-05-06 00:01:00
1  2022-05-06 00:02:00
2  2022-05-06 00:06:00
3  2022-05-06 00:07:00
4  2022-05-06 00:08:00
5  2022-05-06 00:09:00
6  2022-05-06 05:00:00
7  2022-05-06 05:01:00