Home > Mobile >  Timeseries Data - Every 1 second
Timeseries Data - Every 1 second

Time:05-06

I have data coming in every second from a sensor into a csv file, but not sure whether there was a time when device was off and we could not receive the data i.e a particular timestamp is missed. Is there way to check with Python or Pandas that at which timestamp was not recorded. For example at 15:00:01 there was data and then the next data point was at 15:00:03, ideally the data should have been at 15:00:02. So is there way I can check that 15:00:02 is missing from the csv file ?

CodePudding user response:

If your timestamp data is just like '15:00:01', one thing could be to check if the lists of your constructed 'expected' timestamp lists with a 'S' frequency is not in the list of your original timestamp. So something like

list1=list(df['time'].astype(str)) #if df['time'] values in datetime.time format
list2=pd.date_range(df['time'].astype(str).iloc[0], df['time'].astype(str).iloc[-1], freq="S").strftime('%H:%M:%S')

[x for x in list2 if x not in list1]

CodePudding user response:

You could create a template DF that contains all the seconds and then overlay (index match) your sensor data on that. This would leave individual seconds that contain no sensor value with a NaN.

Our test date range:

full_ts_secs = pd.date_range('2022-05-05 15:00:00', end='2022-05-05 15:00:09', freq='s')
full_ts_secs

DatetimeIndex(['2022-05-05 15:00:00', '2022-05-05 15:00:01',
               '2022-05-05 15:00:02', '2022-05-05 15:00:03',
               '2022-05-05 15:00:04', '2022-05-05 15:00:05',
               '2022-05-05 15:00:06', '2022-05-05 15:00:07',
               '2022-05-05 15:00:08', '2022-05-05 15:00:09'],
              dtype='datetime64[ns]', freq='S') 

Our sensor data DF with every other second not present:

sensor_df = pd.DataFrame(data=range(10, 15), index=full_ts_secs[::2], columns=['value'])
sensor_df

                     value
2022-05-05 15:00:00     10
2022-05-05 15:00:02     11
2022-05-05 15:00:04     12
2022-05-05 15:00:06     13
2022-05-05 15:00:08     14

Then overlay the sensor data onto the the template containing all the expected seconds:

pd.concat([pd.DataFrame(index=full_ts_secs), sensor_df], axis=1)

                 value
2022-05-05 15:00:00   10.0
2022-05-05 15:00:01    NaN
2022-05-05 15:00:02   11.0
2022-05-05 15:00:03    NaN
2022-05-05 15:00:04   12.0
2022-05-05 15:00:05    NaN
2022-05-05 15:00:06   13.0
2022-05-05 15:00:07    NaN
2022-05-05 15:00:08   14.0
2022-05-05 15:00:09    NaN

The seconds that don't have sensor data are now represented by NaNs.

  • Related