I'm trying to find consecutive datetimes in Python. I got to the point where I'm able to find if each row conflicts through a loop but am stuck on how to find out if an Event was concurrent. Any suggestions on how I can accomplish this? Open to a similar approach as well!
Concurrent means when Name and Event Date is the same and the count of consecutive conflicts >= 3.
Sample Data 1:
Event ID | Name | Date | Event Start | Event End |
---|---|---|---|---|
123 | Hoper, Charles | 8/4/20 | 8/4/20 8:30 AM | 8/4/20 10:30 AM |
456 | Hoper, Charles | 8/4/20 | 8/4/20 8:50 AM | 8/4/20 9:20 AM |
789 | Hoper, Charles | 8/4/20 | 8/4/20 8:30 AM | 8/4/20 10 AM |
1011 | Perez, Daniel | 8/10/20 | 8/10/20 9 AM | 8/10/20 11 AM |
1213 | Shah, Kim | 8/5/20 | 8/5/20 12 PM | 8/5/20 1 PM |
1415 | Shah, Kim | 8/5/20 | 8/5/20 12:30 PM | 8/5/20 1 PM |
Current Code:
import pandas as pd
import numpy as np
df = pd.read_excel(r'[Path]\TestConcurrent.xlsx')
df['Start'] = df['Event Start']
df['End'] = df['Event End']
df['Start'] = pd.to_datetime(df['Start'], errors='coerce')
df['End'] = pd.to_datetime(df['End'], errors='coerce')
df['conflict'] = len(df) * False
for i, row in df.iterrows():
start, end = "Start", "End"
df.loc[(df['Name'] == row['Name'])
& (((df[start] > row[start]) & (df[start] < row[end])) |
((df[end] > row[start]) & (df[end] < row[end])) |
((row[end] > df[start]) & (row[end] < df[end])) |
((row[start] > df[start]) & (row[start] < df[end]))), 'conflict'] = True
#Edit from answer
df['concurrent'] = df.groupby(['Name','Event Date','conflict'])['Event ID'].transform('count').ge(3)
print(df)
Current Output:
Expected Output:
Update:
Overlapped column determines if Event Start/End date overlap with any other rows if Name == Name. Example, Event 123 and 456 overlap because 456 starts before 123 ends. Concurrent is consecutive conflicts if count >= 3
From suggestion in answer, I tried the below with a little tweek (added 'conflict'). Although resulting true, if I try it on another sample set (sample data 2), it does not work. Sample Data 2 below. It seems to still be counting consecutive Name and Event date without looking at if its a consecutive conflict.
df['concurrent'] = df.groupby(['Name','Event Date','conflict'])['Event ID'].transform('count').ge(3)
Sample Data 2:
Event ID | Name | Date | Event Start | Event End |
---|---|---|---|---|
88 | Cooper, Herbert | 10/20/20 | 10/20/20 8:10 AM | 10/20/20 9:48 AM |
99 | Cooper, Herbert | 10/20/20 | 10/20/20 9:19 AM | 10/20/20 11:30 AM |
10 | Cooper, Herbert | 10/20/20 | 10/20/20 11:52 AM | 10/20/20 1:26 PM |
11 | Cooper, Herbert | 10/20/20 | 10/20/20 1:22 AM | 10/20/20 2:15 PM |
12 | Cooper, Herbert | 10/20/20 | 10/20/20 3:23 PM | 10/20/20 4:10 PM |
Current Output of Sample Data 2:
Expected Output of Sample Data 2:
Just to show you why concurrent should be false, see count of consecutive conflicts column. The first two overlap, which is a set of 2. Then the next two overlap which is another set of 2. None are >= 3.
CodePudding user response:
Ok, final approach! If I understand then you don't care about the "Consecutive" column, you just want to know about the 3 overlapping windows in a row. Here's an approach that tries to answer that question immediately. It passes the two test datasets (thanks for editing!)
import pandas as pd
import numpy as np
import itertools
import io
#Creating the test df you provided
df = pd.read_csv(io.StringIO("""
Event ID;Name;Date;Event Start;Event End
123;Hoper, Charles;8/4/20;8/4/20 8:30 AM;8/4/20 10:30 AM
456;Hoper, Charles;8/4/20;8/4/20 8:50 AM;8/4/20 9:20 AM
789;Hoper, Charles;8/4/20;8/4/20 8:30 AM;8/4/20 10 AM
1011;Perez, Daniel;8/10/20;8/10/20 9 AM;8/10/20 11 AM
1213;Shah, Kim;8/5/20;8/5/20 12 PM;8/5/20 1 PM
1415;Shah, Kim;8/5/20;8/5/20 12:30 PM;8/5/20 1 PM
"""),sep=';')
#Overridding with The second test df
df = pd.read_csv(io.StringIO("""
Event ID;Name;Date;Event Start;Event End
88;Cooper, Herbert;10/20/20;10/20/20 8:10 AM;10/20/20 9:48 AM
99;Cooper, Herbert;10/20/20;10/20/20 9:19 AM;10/20/20 11:30 AM
10;Cooper, Herbert;10/20/20;10/20/20 11:52 AM;10/20/20 1:26 PM
11;Cooper, Herbert;10/20/20;10/20/20 1:22 AM;10/20/20 2:15 PM
12;Cooper, Herbert;10/20/20;10/20/20 3:23 PM;10/20/20 4:10 PM
"""),sep=';')
df['Event Start'] = pd.to_datetime(df['Event Start'])
df['Event End'] = pd.to_datetime(df['Event End'])
df['overlap'] = False
#Iterating line by line keeping track of whether 3 conflicts are found
last_name = None
last_date = None
last_end = pd.Timestamp.max
num_consecutive = 0
for i,r in df.iterrows():
streak_continues = all([
last_name == r['Name'],
last_date == r['Date'],
r['Event Start'] < last_end,
])
if not streak_continues:
if num_consecutive >= 3:
df.loc[
df['Name'].eq(last_name) & df['Date'].eq(last_date),
'overlap'
] = True
num_consecutive = 0
last_name = r['Name']
last_date = r['Date']
last_end = r['Event End']
num_consecutive = 1
df