I have a date dataframe where date contains 15 min of interval. I want to find the missing datetime interval. id should be copied from previous line but value should be nan '''
date value id
2021-12-02 07:00:00 12456677 693214
2021-01-02 07:30:00 12456677 693214
2021-01-02 07:45:00 12456677 693214
2021-01-02 08:00:00 12456677 693214
2021-01-02 08:15:00 12456665 693215
2021-01-02 08:45:00 12456665 693215
2021-01-03 09:00:00 12456666 693217
2021-01-03 10:30:00 12456666 693217
expacted output is
date value id
2021-01-02 08:30:00 NAN 693215
2021-01-02 09:15:00 NAN 693217
2021-01-03 09:30:00 NAN 693217
2021-01-03 09:45:00 NAN 693217
2021-01-03 10:00:00 NAN 693217
I am trying
df['Datetime'] = pd.to_datetime(df['date'])
df[ df['Datetime'].diff() > pd.Timedelta('15min') ]
but it is just giving a time after which date is missing. not the missing date and time.it showed me this output
date value id
2021-01-02 08:15:00 12456665 693215
2021-01-03 09:00:00 12456666 693217
2021-01-03 10:30:00 12456666 693217
can some one please guide me how can I extract missing date and time? Thanks in advance
CodePudding user response:
Use Series.asfreq
per groups for get missing intervals:
#create DatetimeIndex
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
#add 15 Minutes index per days and per id
df1 = (df.groupby([pd.Grouper(freq='D'), 'id'])['value']
.apply(lambda x: x.asfreq('15min'))
.reset_index(level=0, drop=True)
.reset_index())
print (df1)
id date value
0 693214 2021-01-02 07:30:00 12456677.0
1 693214 2021-01-02 07:45:00 12456677.0
2 693214 2021-01-02 08:00:00 12456677.0
3 693215 2021-01-02 08:15:00 12456665.0
4 693215 2021-01-02 08:30:00 NaN
5 693215 2021-01-02 08:45:00 12456665.0
6 693217 2021-01-03 09:00:00 12456666.0
7 693217 2021-01-03 09:15:00 NaN
8 693217 2021-01-03 09:30:00 NaN
9 693217 2021-01-03 09:45:00 NaN
10 693217 2021-01-03 10:00:00 NaN
11 693217 2021-01-03 10:15:00 NaN
12 693217 2021-01-03 10:30:00 12456666.0
13 693214 2021-12-02 07:00:00 12456677.0
Test missing values in boolean indexing
:
df2 = df1[df1['value'].isna()]
print (df2)
id date value
4 693215 2021-01-02 08:30:00 NaN
7 693217 2021-01-03 09:15:00 NaN
8 693217 2021-01-03 09:30:00 NaN
9 693217 2021-01-03 09:45:00 NaN
10 693217 2021-01-03 10:00:00 NaN
11 693217 2021-01-03 10:15:00 NaN