Home > Mobile >  how to extract missing datetime interval in python
how to extract missing datetime interval in python

Time:05-23

I have a date dataframe where date contains 15 min of interval. I want to find the missing datetime interval. id should be copied from previous line but value should be nan '''

 date value id
   2021-12-02 07:00:00  12456677    693214
   2021-01-02 07:30:00  12456677  693214
   2021-01-02 07:45:00  12456677  693214
   2021-01-02 08:00:00  12456677 693214
   2021-01-02 08:15:00  12456665  693215
   2021-01-02 08:45:00  12456665  693215
   2021-01-03 09:00:00  12456666 693217
   2021-01-03 10:30:00  12456666   693217
expacted output is

date value id
   
   2021-01-02 08:30:00  NAN  693215
   2021-01-02 09:15:00 NAN    693217
   2021-01-03 09:30:00 NAN    693217
   2021-01-03 09:45:00 NAN    693217
   2021-01-03 10:00:00  NAN   693217

I am trying

df['Datetime'] = pd.to_datetime(df['date'])
df[ df['Datetime'].diff() > pd.Timedelta('15min') ]

but it is just giving a time after which date is missing. not the missing date and time.it showed me this output

date value id

   2021-01-02 08:15:00  12456665  693215
   2021-01-03 09:00:00  12456666 693217
   2021-01-03 10:30:00  12456666   693217

can some one please guide me how can I extract missing date and time? Thanks in advance

CodePudding user response:

Use Series.asfreq per groups for get missing intervals:

#create DatetimeIndex
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')

#add 15 Minutes index per days and per id
df1 = (df.groupby([pd.Grouper(freq='D'), 'id'])['value']
         .apply(lambda x: x.asfreq('15min'))
         .reset_index(level=0, drop=True)
         .reset_index())
print (df1)
        id                date       value
0   693214 2021-01-02 07:30:00  12456677.0
1   693214 2021-01-02 07:45:00  12456677.0
2   693214 2021-01-02 08:00:00  12456677.0
3   693215 2021-01-02 08:15:00  12456665.0
4   693215 2021-01-02 08:30:00         NaN
5   693215 2021-01-02 08:45:00  12456665.0
6   693217 2021-01-03 09:00:00  12456666.0
7   693217 2021-01-03 09:15:00         NaN
8   693217 2021-01-03 09:30:00         NaN
9   693217 2021-01-03 09:45:00         NaN
10  693217 2021-01-03 10:00:00         NaN
11  693217 2021-01-03 10:15:00         NaN
12  693217 2021-01-03 10:30:00  12456666.0
13  693214 2021-12-02 07:00:00  12456677.0

Test missing values in boolean indexing:

df2 = df1[df1['value'].isna()]
print (df2)
        id                date  value
4   693215 2021-01-02 08:30:00    NaN
7   693217 2021-01-03 09:15:00    NaN
8   693217 2021-01-03 09:30:00    NaN
9   693217 2021-01-03 09:45:00    NaN
10  693217 2021-01-03 10:00:00    NaN
11  693217 2021-01-03 10:15:00    NaN
  • Related