I'm new to python and i faced some difficulty on writing this certain of codes. So to give a background, i want to do an ever-default tagging for credit risk modelling.
0 for non-default accounts and 1 for default accounts
So the idea/concept is that throughout the date performance ( of 12 months), if the specific customer (ID) ever has one event of default (1) then the next performance after that event (for that customer) will be tagged as a default (1) even though the 'default tagging' is 0.
so the input like this:
ID | Date Performance | Default Tag |
---|---|---|
AAA | 2021-03-01 | 0 |
AAA | 2021-04-01 | 0 |
AAA | 2021-05-01 | 0 |
AAA | 2021-06-01 | 0 |
AAA | 2021-07-01 | 0 |
AAA | 2021-08-01 | 0 |
AAA | 2021-09-01 | 0 |
AAA | 2021-10-01 | 0 |
AAA | 2021-11-01 | 0 |
AAA | 2021-12-01 | 0 |
AAA | 2022-01-01 | 0 |
AAA | 2022-02-01 | 0 |
ABB | 2021-03-01 | 0 |
ABB | 2021-04-01 | 0 |
ABB | 2021-05-01 | 0 |
ABB | 2021-06-01 | 1 |
ABB | 2021-07-01 | 0 |
ABB | 2021-08-01 | 0 |
ABB | 2021-09-01 | 1 |
ABB | 2021-10-01 | 0 |
ABB | 2021-11-01 | 0 |
ABB | 2021-12-01 | 0 |
ABB | 2022-01-01 | 0 |
ABB | 2022-02-01 | 0 |
And the output would be on a new column like this:
ID | Date Performance | Ever Default Tag |
---|---|---|
AAA | 2021-03-01 | 0 |
AAA | 2021-04-01 | 0 |
AAA | 2021-05-01 | 0 |
AAA | 2021-06-01 | 0 |
AAA | 2021-07-01 | 0 |
AAA | 2021-08-01 | 0 |
AAA | 2021-09-01 | 0 |
AAA | 2021-10-01 | 0 |
AAA | 2021-11-01 | 0 |
AAA | 2021-12-01 | 0 |
AAA | 2022-01-01 | 0 |
AAA | 2022-02-01 | 0 |
ABB | 2021-03-01 | 0 |
ABB | 2021-04-01 | 0 |
ABB | 2021-05-01 | 0 |
ABB | 2021-06-01 | 1 |
ABB | 2021-07-01 | 1 |
ABB | 2021-08-01 | 1 |
ABB | 2021-09-01 | 1 |
ABB | 2021-10-01 | 1 |
ABB | 2021-11-01 | 1 |
ABB | 2021-12-01 | 1 |
ABB | 2022-01-01 | 1 |
ABB | 2022-02-01 | 1 |
CodePudding user response:
Please try this:
df = df.reset_index(drop=True)
for id in df['ID'].unique():
try:
df.loc[df['ID']==id,['Ever Default Tag']] = 0
index_value = list(df[(df['ID']==id)&(df['Default Tag']==1)].index)[0]
df.loc[index_value:,['Ever Default Tag']] = 1
except:
continue
df['Ever Default Tag'] = df['Ever Default Tag'].astype(int)
CodePudding user response:
Here is some code to do what your question asks:
(UPDATED to replace one-line lambda with more readable function for apply
, and also to correct an off-by-one error on when default begins.)
import pandas as pd
records = [
{'ID':'AAA', 'Date Performance': '2021-03-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2021-04-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2021-05-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2021-06-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2021-07-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2021-08-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2021-09-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2021-10-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2021-11-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2021-12-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2022-01-01', 'Default Tag': 0},
{'ID':'AAA', 'Date Performance': '2022-02-01', 'Default Tag': 0},
{'ID':'AAB', 'Date Performance': '2021-03-01', 'Default Tag': 0},
{'ID':'AAB', 'Date Performance': '2021-04-01', 'Default Tag': 0},
{'ID':'AAB', 'Date Performance': '2021-05-01', 'Default Tag': 0},
{'ID':'AAB', 'Date Performance': '2021-06-01', 'Default Tag': 1},
{'ID':'AAB', 'Date Performance': '2021-07-01', 'Default Tag': 0},
{'ID':'AAB', 'Date Performance': '2021-08-01', 'Default Tag': 0},
{'ID':'AAB', 'Date Performance': '2021-09-01', 'Default Tag': 1},
{'ID':'AAB', 'Date Performance': '2021-10-01', 'Default Tag': 0},
{'ID':'AAB', 'Date Performance': '2021-11-01', 'Default Tag': 0},
{'ID':'AAB', 'Date Performance': '2021-12-01', 'Default Tag': 0},
{'ID':'AAB', 'Date Performance': '2022-01-01', 'Default Tag': 0},
{'ID':'AAB', 'Date Performance': '2022-02-01', 'Default Tag': 0}
]
df = pd.DataFrame(records)
def hasEverDefaulted(df, x):
id, dt = 'ID', 'Date Performance'
hasSameIdAndNotLaterDate = (df[id] == x[id]) & (df[dt] <= x[dt])
return int(sum(df[hasSameIdAndNotLaterDate]['Default Tag']) > 0)
df['Default Tag'] = df.apply(lambda x: hasEverDefaulted(df, x), axis=1)
df.rename(columns={'Default Tag': 'Ever Default Tag'}, inplace=True)
print(df)
Output:
ID Date Performance Ever Default Tag
0 AAA 2021-03-01 0
1 AAA 2021-04-01 0
2 AAA 2021-05-01 0
3 AAA 2021-06-01 0
4 AAA 2021-07-01 0
5 AAA 2021-08-01 0
6 AAA 2021-09-01 0
7 AAA 2021-10-01 0
8 AAA 2021-11-01 0
9 AAA 2021-12-01 0
10 AAA 2022-01-01 0
11 AAA 2022-02-01 0
12 AAB 2021-03-01 0
13 AAB 2021-04-01 0
14 AAB 2021-05-01 0
15 AAB 2021-06-01 1
16 AAB 2021-07-01 1
17 AAB 2021-08-01 1
18 AAB 2021-09-01 1
19 AAB 2021-10-01 1
20 AAB 2021-11-01 1
21 AAB 2021-12-01 1
22 AAB 2022-01-01 1
23 AAB 2022-02-01 1