Home > Software engineering >  Subtracting times in a csv for a row by row basis in Python
Subtracting times in a csv for a row by row basis in Python

Time:11-29

I have a CSV that's few thousand rows long. It contains data sent from various devices. They should transmit frequently (every 10 minutes) however sometimes there is a lag. I'm trying to write a program that will highlight all instances where the delay between two readings is greater than 15 minutes

I've made a functional code that works, but with this code I first have to manually edit the CSV to change the "eventTime" variable from time format (e.g. 03:22:00) to a float value based on 1/24 (e.g. 03:22:00 becomes 0.14027). Similarly, the 15 minute interval becomes 0.01042 (15/(60*24))

import pandas as pd

df = pd.read_csv('file.csv')
df2 = pd.DataFrame()
deviceID = df["deviceId"].unique().tolist()
threshold = 0.01042
for id_no in range(0, len(deviceID)):
    subset = df[df.deviceId == deviceID[id_no]]
    for row in range(len(subset)-1):
        difference = subset.iloc[row, 1] - subset.iloc[row 1, 1]
        if difference > threshold:
            df2 = df2.append(subset.iloc[row])
            df2 = df2.append(subset.iloc[row 1])

df2.to_csv('file2.csv)

This works, and I can open the CSV in excel and manually change the float values back to time format, but when I might be dealing with a few hundred CSV files, this becomes impractical,

I've tried this below

import pandas as pd
from datetime import datetime

df = pd.read_csv('file.csv')
df2 = pd.DataFrame()
deviceID = df["deviceId"].unique().tolist()
df['eventTime'].apply(lambda x: datetime.strptime(x, "%H:%M:%S"))
threshold = datetime.strptime("00:15:00", '%H:%M:%S')
for id_no in range(0, len(deviceID)):
    subset = df[df.deviceId == deviceID[id_no]]
    for row in range(len(subset)-1):
        difference = datetime.strptime(subset.iloc[row, 1],'%H:%M:%S') - datetime.strptime(subset.iloc[row 1, 1], '%H:%M:%S')
        if difference > threshold:
            df2 = df2.append(subset.iloc[row])
            df2 = df2.append(subset.iloc[row 1])

df2.to_csv('file2.csv')

but I get the following error:

if difference > threshold:
TypeError: '>' not supported between instances of 'datetime.timedelta' and 'datetime.datetime'

The data looks like this:

| eventTime|  deviceId|
| -------- | -------- |
| 15:30:00 |  11234889|
| 15:45:00 |  11234889|
| 16:00:00 |  11234889|

and for different IDs

| eventTime|  deviceId|
| -------- | -------- |
| 15:30:00 |  11234890|
| 15:45:00 |  11234890|
| 16:00:00 |  11234890|

CodePudding user response:

threshold is datetime and you compare it to timedelta object (difference). Did you mean:

from datetime import timedelta
...
threshold = datetime.timedelta(minutes=15)
  • Related