I want to check for time gaps in test_date_time
column:
device_id eventDateTime firstPacketDateTime lastPacketDateTime test_date_time
MC2C5HRC0KF445779 27/12/2021 0:45 27/12/2021 0:45 27/12/2021 0:59 0:45
MC2C5HRC0KF445779 27/12/2021 1:00 27/12/2021 1:00 27/12/2021 1:13 1:00
MC2C5HRC0KF445779 27/12/2021 1:15 27/12/2021 1:16 27/12/2021 1:29 1:15
MC2C5HRC0KF445779 27/12/2021 1:30 27/12/2021 1:30 27/12/2021 1:42 1:30
MC2H3JRC0LC178168 27/12/2021 10:45 27/12/2021 10:46 27/12/2021 10:56 10:45
MC2H3JRC0LC178168 27/12/2021 11:15 27/12/2021 11:15 27/12/2021 11:25 11:15
MC2H3JRC0LC178168 27/12/2021 11:30 27/12/2021 11:35 27/12/2021 11:35 11:30
Time should be increasing from 0:00 to 23:45 with each index of the column have a value with gap of 15 mins. Eg:
test_date_time
0:00
0:15
0:30
0:45
1:00
1:15
1:30
I want to separate out those rows having time gap of more than 15 mins and store them inside another dataframe.
How can I implement this without using a for loop?
CodePudding user response:
df = pd.read_csv(r"C:\...\input.csv")
# Creating new column with the type casted times
df["test_dt"] = pd.to_timedelta(df['test_date_time'] ":00")
# Another new column with just the difference of the consecutive values
df["delta_test_dt"] = df["test_dt"].diff()
# Now creating a timedelta object for the 15 minutes gap
delta = pd.to_timedelta("15m")
# Filtering the dataframe
new_df = df[df["delta_test_dt"] > delta]
Ref: https://pandas.pydata.org/docs/reference/api/pandas.to_timedelta.html
This will store rows 4 and 5 in the new data frame (indexing from 0).
CodePudding user response:
Convert eventDateTime
(or test_date_time
) to datetime types. Then use df.shift()
to create a new column prev
which is the evenDataTime
's previous value. Then filter on those which are more than 15 mins apart. (Note that you should probably do this by device_id
and not all rows together.)
import datetime
df['eventDateTime'] = pd.to_datetime(df['eventDateTime'])
df['prev'] = df['eventDateTime'].shift()
df[df['eventDateTime'] - df['prev'] > datetime.timedelta(minutes=15)]
Result, which you can assign to a new dataframe:
device_id eventDateTime firstPacketDateTime lastPacketDateTime test_date_time prev
4 MC2H3JRC0LC178168 2021-12-27 10:45:00 27/12/2021 10:46 27/12/2021 10:56 10:45 2021-12-27 01:30:00
5 MC2H3JRC0LC178168 2021-12-27 11:15:00 27/12/2021 11:15 27/12/2021 11:25 11:15 2021-12-27 10:45:00
If you don't want a new column prev
, then drop it after or:
df['eventDateTime'] = pd.to_datetime(df['eventDateTime'])
df[df['eventDateTime'] - df['eventDateTime'].shift() > datetime.timedelta(minutes=15)]
(that can also be reduced to one very unreadable line:
df[pd.to_datetime(df['eventDateTime']) - pd.to_datetime(df['eventDateTime']).shift() > datetime.timedelta(minutes=15)]
And with test_date_time
column:
df[pd.to_datetime(df['test_date_time']) - pd.to_datetime(df['test_date_time']).shift() > \
datetime.timedelta(minutes=15)]