I have a data frame that is sorted by CLIENT_ID
and ENCOUNTER_DATE
as shown below:
CLIENT_ID | ENCOUNTER_DATE | STAGE |
---|---|---|
8222 | 2020-01-01 | 1 |
8222 | 2020-03-02 | 1 |
8222 | 2020-04-18 | 2 |
8222 | 2020-07-31 | 1 |
8300 | 2017-06-10 | 1 |
8300 | 2017-09-11 | 2 |
8300 | 2018-02-01 | 2 |
8300 | 2018-04-01 | 3 |
8300 | 2018-05-31 | 4 |
8400 | 2020-12-31 | 1 |
8401 | 2017-08-29 | 1 |
8401 | 2017-09-15 | 3 |
8500 | 2018-10-10 | 2 |
How can I create a new column (flag column) which indicates whether or not the STAGE
from the previous DATE_ENCOUNTER
for each CLIENT_ID
is greater than the current DATE_ENCOUNTER
which would result in the table below:
CLIENT_ID | ENCOUNTER_DATE | STAGE | STAGE_WORSENED |
---|---|---|---|
8222 | 2020-01-01 | 1 | 0 |
8222 | 2020-03-02 | 1 | 0 |
8222 | 2020-04-18 | 2 | 1 |
8222 | 2020-07-31 | 1 | 0 |
8300 | 2017-06-10 | 1 | 0 |
8300 | 2017-09-11 | 2 | 1 |
8300 | 2018-02-01 | 2 | 0 |
8300 | 2018-04-01 | 3 | 1 |
8300 | 2018-05-31 | 4 | 1 |
8400 | 2020-12-31 | 1 | 0 |
8401 | 2017-08-29 | 1 | 0 |
8401 | 2017-09-15 | 3 | 1 |
8500 | 2018-10-10 | 2 | 0 |
Here is the code to generate the df:
df = pd.DataFrame({"CLIENT_ID": [8222, 8222, 8222, 8222, 8300, 8300, 8300, 8300, 8300, 8400, 8401, 8401, 8500],
"ENCOUNTER_DATE": ['2020-01-01', '2020-03-02', '2020-04-18', '2020-07-31', '2017-06-10', '2017-09-11', '2018-02-01', '2018-04-01', '2018-05-31', '2020-12-31', '2017-08-29', '2017-09-15', '2018-10-10'],
"STAGE": [1, 1, 2, 1, 1, 2, 2, 3, 4, 1, 1, 3, 2]})
CodePudding user response:
Use DataFrameGroupBy.diff
with compare for greater or equal by 1
and cast to integers:
df['STAGE_WORSENED'] = df.groupby('CLIENT_ID')['STAGE'].diff().ge(1).astype(int)
print (df)
CLIENT_ID ENCOUNTER_DATE STAGE STAGE_WORSENED
0 8222 2020-01-01 1 0
1 8222 2020-03-02 1 0
2 8222 2020-04-18 2 1
3 8222 2020-07-31 1 0
4 8300 2017-06-10 1 0
5 8300 2017-09-11 2 1
6 8300 2018-02-01 2 0
7 8300 2018-04-01 3 1
8 8300 2018-05-31 4 1
9 8400 2020-12-31 1 0
10 8401 2017-08-29 1 0
11 8401 2017-09-15 3 1
12 8500 2018-10-10 2 0