I have a CSV that's few thousand rows long. It contains data sent from various devices. They should transmit frequently (every 10 minutes) however sometimes there is a lag. I'm trying to write a program that will highlight all instances where the delay between two readings is greater than 15 minutes
I've made a functional code that works, but with this code I first have to manually edit the CSV to change the "eventTime" variable from time format (e.g. 03:22:00) to a float value based on 1/24 (e.g. 03:22:00 becomes 0.14027). Similarly, the 15 minute interval becomes 0.01042 (15/(60*24))
import pandas as pd
df = pd.read_csv('file.csv')
df2 = pd.DataFrame()
deviceID = df["deviceId"].unique().tolist()
threshold = 0.01042
for id_no in range(0, len(deviceID)):
subset = df[df.deviceId == deviceID[id_no]]
for row in range(len(subset)-1):
difference = subset.iloc[row, 1] - subset.iloc[row 1, 1]
if difference > threshold:
df2 = df2.append(subset.iloc[row])
df2 = df2.append(subset.iloc[row 1])
df2.to_csv('file2.csv)
This works, and I can open the CSV in excel and manually change the float values back to time format, but when I might be dealing with a few hundred CSV files, this becomes impractical,
I've tried this below
import pandas as pd
from datetime import datetime
df = pd.read_csv('file.csv')
df2 = pd.DataFrame()
deviceID = df["deviceId"].unique().tolist()
df['eventTime'].apply(lambda x: datetime.strptime(x, "%H:%M:%S"))
threshold = datetime.strptime("00:15:00", '%H:%M:%S')
for id_no in range(0, len(deviceID)):
subset = df[df.deviceId == deviceID[id_no]]
for row in range(len(subset)-1):
difference = datetime.strptime(subset.iloc[row, 1],'%H:%M:%S') - datetime.strptime(subset.iloc[row 1, 1], '%H:%M:%S')
if difference > threshold:
df2 = df2.append(subset.iloc[row])
df2 = df2.append(subset.iloc[row 1])
df2.to_csv('file2.csv')
but I get the following error:
if difference > threshold:
TypeError: '>' not supported between instances of 'datetime.timedelta' and 'datetime.datetime'
The data looks like this:
| eventTime| deviceId|
| -------- | -------- |
| 15:30:00 | 11234889|
| 15:45:00 | 11234889|
| 16:00:00 | 11234889|
and for different IDs
| eventTime| deviceId|
| -------- | -------- |
| 15:30:00 | 11234890|
| 15:45:00 | 11234890|
| 16:00:00 | 11234890|
CodePudding user response:
threshold
is datetime
and you compare it to timedelta
object (difference
). Did you mean:
from datetime import timedelta
...
threshold = datetime.timedelta(minutes=15)