Ever Default Tagging - for Credit Risk Modelling- Python-CodePudding

I'm new to python and i faced some difficulty on writing this certain of codes. So to give a background, i want to do an ever-default tagging for credit risk modelling.

0 for non-default accounts and 1 for default accounts

So the idea/concept is that throughout the date performance ( of 12 months), if the specific customer (ID) ever has one event of default (1) then the next performance after that event (for that customer) will be tagged as a default (1) even though the 'default tagging' is 0.

so the input like this:

ID	Date Performance	Default Tag
AAA	2021-03-01	0
AAA	2021-04-01	0
AAA	2021-05-01	0
AAA	2021-06-01	0
AAA	2021-07-01	0
AAA	2021-08-01	0
AAA	2021-09-01	0
AAA	2021-10-01	0
AAA	2021-11-01	0
AAA	2021-12-01	0
AAA	2022-01-01	0
AAA	2022-02-01	0
ABB	2021-03-01	0
ABB	2021-04-01	0
ABB	2021-05-01	0
ABB	2021-06-01	1
ABB	2021-07-01	0
ABB	2021-08-01	0
ABB	2021-09-01	1
ABB	2021-10-01	0
ABB	2021-11-01	0
ABB	2021-12-01	0
ABB	2022-01-01	0
ABB	2022-02-01	0

And the output would be on a new column like this:

ID	Date Performance	Ever Default Tag
AAA	2021-03-01	0
AAA	2021-04-01	0
AAA	2021-05-01	0
AAA	2021-06-01	0
AAA	2021-07-01	0
AAA	2021-08-01	0
AAA	2021-09-01	0
AAA	2021-10-01	0
AAA	2021-11-01	0
AAA	2021-12-01	0
AAA	2022-01-01	0
AAA	2022-02-01	0
ABB	2021-03-01	0
ABB	2021-04-01	0
ABB	2021-05-01	0
ABB	2021-06-01	1
ABB	2021-07-01	1
ABB	2021-08-01	1
ABB	2021-09-01	1
ABB	2021-10-01	1
ABB	2021-11-01	1
ABB	2021-12-01	1
ABB	2022-01-01	1
ABB	2022-02-01	1

CodePudding user response：

Please try this:

df = df.reset_index(drop=True)
for id in df['ID'].unique():
  try:
    df.loc[df['ID']==id,['Ever Default Tag']] = 0
    index_value = list(df[(df['ID']==id)&(df['Default Tag']==1)].index)[0]
    df.loc[index_value:,['Ever Default Tag']] = 1
  except:
    continue
df['Ever Default Tag'] = df['Ever Default Tag'].astype(int)

CodePudding user response：

Here is some code to do what your question asks:

(UPDATED to replace one-line lambda with more readable function for apply, and also to correct an off-by-one error on when default begins.)

import pandas as pd
records = [
    {'ID':'AAA', 'Date Performance': '2021-03-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2021-04-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2021-05-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2021-06-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2021-07-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2021-08-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2021-09-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2021-10-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2021-11-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2021-12-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2022-01-01', 'Default Tag': 0},
    {'ID':'AAA', 'Date Performance': '2022-02-01', 'Default Tag': 0},
    {'ID':'AAB', 'Date Performance': '2021-03-01', 'Default Tag': 0},
    {'ID':'AAB', 'Date Performance': '2021-04-01', 'Default Tag': 0},
    {'ID':'AAB', 'Date Performance': '2021-05-01', 'Default Tag': 0},
    {'ID':'AAB', 'Date Performance': '2021-06-01', 'Default Tag': 1},
    {'ID':'AAB', 'Date Performance': '2021-07-01', 'Default Tag': 0},
    {'ID':'AAB', 'Date Performance': '2021-08-01', 'Default Tag': 0},
    {'ID':'AAB', 'Date Performance': '2021-09-01', 'Default Tag': 1},
    {'ID':'AAB', 'Date Performance': '2021-10-01', 'Default Tag': 0},
    {'ID':'AAB', 'Date Performance': '2021-11-01', 'Default Tag': 0},
    {'ID':'AAB', 'Date Performance': '2021-12-01', 'Default Tag': 0},
    {'ID':'AAB', 'Date Performance': '2022-01-01', 'Default Tag': 0},
    {'ID':'AAB', 'Date Performance': '2022-02-01', 'Default Tag': 0}
]
df = pd.DataFrame(records)
def hasEverDefaulted(df, x):
    id, dt = 'ID', 'Date Performance'
    hasSameIdAndNotLaterDate = (df[id] == x[id]) & (df[dt] <= x[dt])
    return int(sum(df[hasSameIdAndNotLaterDate]['Default Tag']) > 0)
df['Default Tag'] = df.apply(lambda x: hasEverDefaulted(df, x), axis=1)
df.rename(columns={'Default Tag': 'Ever Default Tag'}, inplace=True)
print(df)

Output:

     ID Date Performance  Ever Default Tag
0   AAA       2021-03-01                 0
1   AAA       2021-04-01                 0
2   AAA       2021-05-01                 0
3   AAA       2021-06-01                 0
4   AAA       2021-07-01                 0
5   AAA       2021-08-01                 0
6   AAA       2021-09-01                 0
7   AAA       2021-10-01                 0
8   AAA       2021-11-01                 0
9   AAA       2021-12-01                 0
10  AAA       2022-01-01                 0
11  AAA       2022-02-01                 0
12  AAB       2021-03-01                 0
13  AAB       2021-04-01                 0
14  AAB       2021-05-01                 0
15  AAB       2021-06-01                 1
16  AAB       2021-07-01                 1
17  AAB       2021-08-01                 1
18  AAB       2021-09-01                 1
19  AAB       2021-10-01                 1
20  AAB       2021-11-01                 1
21  AAB       2021-12-01                 1
22  AAB       2022-01-01                 1
23  AAB       2022-02-01                 1