Get the missing column timestamps-CodePudding

I have a filtered time series dataframe. When I see the missing datetime columns, the figure obtained is as below:

df.datetime.diff().plot()

Here, I can manually see the missing datetime values as the spikes. Is there a way to get the start and stop of these datetime column, if for example they are missing for more than one minute?

Example dataframe:

dic = {
    'datetime' : [23.3.2021 23:18:00, 23.3.2021 23:19:00, 23.3.2021 23:20:00, 23.3.2021 23:25:00, 23.3.2021 23:26:00, 23.3.2021 23:27:00],
    'col1':[31.45,38.361,37.0,31.49,32.891, 41],
    'col2':[38.776,38.444,38.945,38.647,39.667,40.0],
    
}
df=pd.DataFrame(dic)

CodePudding user response：

You can use:

# ensure datetime
df['datetime'] = pd.to_datetime(df['datetime'])

# identify values above 1min
m = df['datetime'].diff().gt('1min')

# group the consecutive values above threshold
out = (df.loc[m, 'datetime'].groupby((~m).cumsum())
         .agg(start='min', stop='max')
         .reset_index(drop=True)
      )

output:

                start                stop
0 2021-03-23 23:25:00 2021-03-23 23:25:00