I have data coming in every second from a sensor into a csv file, but not sure whether there was a time when device was off and we could not receive the data i.e a particular timestamp is missed. Is there way to check with Python or Pandas that at which timestamp was not recorded. For example at 15:00:01 there was data and then the next data point was at 15:00:03, ideally the data should have been at 15:00:02. So is there way I can check that 15:00:02 is missing from the csv file ?
CodePudding user response:
If your timestamp data is just like '15:00:01', one thing could be to check if the lists of your constructed 'expected' timestamp lists with a 'S' frequency is not in the list of your original timestamp. So something like
list1=list(df['time'].astype(str)) #if df['time'] values in datetime.time format
list2=pd.date_range(df['time'].astype(str).iloc[0], df['time'].astype(str).iloc[-1], freq="S").strftime('%H:%M:%S')
[x for x in list2 if x not in list1]
CodePudding user response:
You could create a template DF that contains all the seconds and then overlay (index match) your sensor data on that. This would leave individual seconds that contain no sensor value with a NaN
.
Our test date range:
full_ts_secs = pd.date_range('2022-05-05 15:00:00', end='2022-05-05 15:00:09', freq='s')
full_ts_secs
DatetimeIndex(['2022-05-05 15:00:00', '2022-05-05 15:00:01',
'2022-05-05 15:00:02', '2022-05-05 15:00:03',
'2022-05-05 15:00:04', '2022-05-05 15:00:05',
'2022-05-05 15:00:06', '2022-05-05 15:00:07',
'2022-05-05 15:00:08', '2022-05-05 15:00:09'],
dtype='datetime64[ns]', freq='S')
Our sensor data DF with every other second not present:
sensor_df = pd.DataFrame(data=range(10, 15), index=full_ts_secs[::2], columns=['value'])
sensor_df
value
2022-05-05 15:00:00 10
2022-05-05 15:00:02 11
2022-05-05 15:00:04 12
2022-05-05 15:00:06 13
2022-05-05 15:00:08 14
Then overlay the sensor data onto the the template containing all the expected seconds:
pd.concat([pd.DataFrame(index=full_ts_secs), sensor_df], axis=1)
value
2022-05-05 15:00:00 10.0
2022-05-05 15:00:01 NaN
2022-05-05 15:00:02 11.0
2022-05-05 15:00:03 NaN
2022-05-05 15:00:04 12.0
2022-05-05 15:00:05 NaN
2022-05-05 15:00:06 13.0
2022-05-05 15:00:07 NaN
2022-05-05 15:00:08 14.0
2022-05-05 15:00:09 NaN
The seconds that don't have sensor data are now represented by NaN
s.