Jim and Bob are part of a loan repayment program. They need to be making monthly payments to pay off their loans. Sometimes, a payment can be declined for various reasons. I would like to find when there are 3 consecutive declined payments in a row(so no complete payment in-between any of the 3).
Here are the dataframes:
DF1
| Name | Test | Due Date |
| -----| ---- |------------------------|
| Jim | 1 | 2020-05-10 |
| Bob | 2 | 2021-06-11 |
Here we have the "payment" dataframe.
DF2
| Name | Payment Date | Declined/Complete |
| -----|------------ | -------------------|
| Jim | 2020-03-9 | declined |
| Jim | 2020-04-5 | declined |
| Jim | 2020-05-6 | declined |
| Bob | 2021-03-20 | declined |
| Bob | 2021-04-11 | complete |
| Bob | 2021-05-11 | declined
Because Jim has had 3 consecutive declined payments before his due date, he gets flagged(1).
Bob had a complete payment in-between his last 3 payments, so he does not get flagged(0).
Expected Output
| Name | Due Date | 3 consecutive declines |
| -----|-------------| --------------------------|
| Jim |2020-05-10 |1 |
| Bob |2022-03-10 |0 |
I was thinking of utilizing joins or some group by counts, but I can't seem to figure out how it would all fit together.
Edit: Also, the 3 consecutive declines need to be before the due date.
CodePudding user response:
You can use a groupby
rolling
:
(df2['Declined/Complete']
.eq('declined')
.groupby(df['Name'])
.apply(lambda g: g.rolling(3).sum().eq(3).any()).astype(int)
)
Output:
Name
Bob 0
Jim 1
Name: Declined/Complete, dtype: int64
To filter the dates and assign in df1
, use:
df1['Due Date'] = pd.to_datetime(df1['Due Date'])
df2['Payment Date'] = pd.to_datetime(df2['Payment Date'])
df1['3 cons. dec.'] = df1['Name'].map(df2
.loc[df2['Payment Date'].le(df2['Name']
.map(df1.set_index('Name')['Due Date'])),
'Declined/Complete']
.eq('declined')
.groupby(df2['Name'])
.apply(lambda g: g.rolling(3).sum().eq(3).any()).astype(int)
)
output:
Name Test Due Date 3 cons. dec.
0 Jim 1 2020-05-10 1
1 Bob 2 2021-06-11 0