Suppose I have a dataframe:
Case_no | Event_ID | Date | Type | Done_by |
---|---|---|---|---|
1 | 1 | 2023-01-01 | Email Sent | Customer |
1 | 2 | 2023-01-02 | Response | Agent |
1 | 3 | 2023-01-03 | Email Sent | Customer |
1 | 4 | 2023-01-10 | Response | Agent |
2 | 5 | 2023-01-02 | Email Sent | Customer |
2 | 6 | 2023-01-04 | Email Sent | Customer |
And I wanted to add two boolean columns that check for each case if an email has been sent in, has there been a response within 5 days on that case. There is no way to link emails and responses but just having it flag up that a response has been sent and whether or not it was in 5 days after is good enough for what I am doing as such:
Case_no | Event_ID | Date | Type | Done_by | Response? | Response_Within_Five_Days |
---|---|---|---|---|---|---|
1 | 1 | 2023-01-01 | Email Sent | Customer | TRUE | TRUE |
1 | 2 | 2023-01-02 | Response | Agent | nan | NAN |
1 | 3 | 2023-01-03 | Email Sent | Customer | TRUE | FALSE |
1 | 4 | 2023-01-10 | Response | Agent | NAN | NAN |
2 | 5 | 2023-01-02 | Email Sent | Customer | FALSE | FALSE |
2 | 6 | 2023-01-04 | Email Sent | Customer | FALSE | FALSE |
How would I go about doing that?
CodePudding user response:
Solution for compare per groups if next value after Email Sent
is Response
by DataFrameGroupBy.shift
:
df['Date'] = pd.to_datetime(df['Date'])
shift = df.groupby('Case_no')[['Type','Date']].shift(-1)
m1 = df['Type'].eq('Email Sent')
m2 = shift['Type'].eq('Response')
m3 = shift['Date'].sub(df['Date']).dt.days.lt(5)
df['Response?'] = (m1 & m2).astype('boolean')
df['Response_Within_Five_Days'] = df['Response?'] & m3
df.loc[df['Type'].eq('Response'), ['Response?','Response_Within_Five_Days']] = np.nan
print (df)
Case_no Event_ID Date Type Done_by Response? \
0 1 1 2023-01-01 Email Sent Customer True
1 1 2 2023-01-02 Response Agent <NA>
2 1 3 2023-01-03 Email Sent Customer True
3 1 4 2023-01-10 Response Agent <NA>
4 2 5 2023-01-02 Email Sent Customer False
5 2 6 2023-01-04 Email Sent Customer False
Response_Within_Five_Days
0 True
1 <NA>
2 False
3 <NA>
4 False
5 False
EDIT: For check next rows after Email Sent
use custom function:
def f(x):
m = x['Type'].eq('Email Sent')
for k, v in x.loc[m, 'Date'].items():
s = x[~m].loc[k:, 'Date'].sub(v).dt.days
x.loc[k, 'Response?'] = not s.empty
x.loc[k, 'Response_Within_Five_Days'] = s.lt(5).any()
return x
df = df.reset_index(drop=True)
df = df.groupby('Case_no').apply(f)
print (df)
Case_no Event_ID Date Type Done_by Response? \
0 1 1 2023-01-01 Email Sent Customer True
1 1 2 2023-01-02 Response Agent NaN
2 1 3 2023-01-03 Email Sent Customer True
3 1 4 2023-01-10 Response Agent NaN
4 2 5 2023-01-02 Email Sent Customer False
5 2 6 2023-01-04 Email Sent Customer False
Response_Within_Five_Days
0 True
1 NaN
2 False
3 NaN
4 False
5 False