I want to compare the two columns (Admit_date and Discharge_date) for same ID and set the value of column Transfer to 'Yes' if admit_date matches with the previous Discharge_date.
INPUT:
ID | Admit_Date | Discharge_date | Transfer |
---|---|---|---|
1 | 31 Jan 2022 | 1 Feb 2022 | |
1 | 1 Feb 2022 | 2 Feb 2022 | |
1 | 7 Feb 2022 | 8 Feb 2022 | |
2 | 2 Feb 2022 | 2 Feb 2022 | |
3 | 2 Feb 2022 | 3 Feb 2022 |
Input in CSV format:
ID,Admit_date,Discharge_date,Transfer
1,31/01/2022,01/02/2022,
1,01/02/2022,02/02/2022,
1,07/02/2022,08/02/2022,
2,02/02/2022,02/02/2022,
3,02/02/2022,03/02/2022,
Output:
ID | Admit_Date | Discharge_date | Transfer |
---|---|---|---|
1 | 31 Jan 2022 | 1 Feb 2022 | |
1 | 1 Feb 2022 | 2 Feb 2022 | YES |
1 | 7 Feb 2022 | 8 Feb 2022 | |
2 | 2 Feb 2022 | 2 Feb 2022 | |
3 | 2 Feb 2022 | 3 Feb 2022 |
In CSV format:
ID,Admit_date,Discharge_date,Transfer
1,31/01/2022,01/02/2022,
1,01/02/2022,02/02/2022, YES
1,07/02/2022,08/02/2022,
2,02/02/2022,02/02/2022,
3,02/02/2022,03/02/2022,
CodePudding user response:
You can use groupby
and apply
to apply a custom function to each group. The function then needs to compare the Admit_date
with the previous row's Discharge_date
which can be done using shift
. This will give a boolean value (True or False), to get a string value you can use np.where
. For example:
import numpy as np
def func(x):
return x['Admit_date'] == x['Discharge_date'].shift(1)
df['Transfer'] = np.where(df.groupby('ID').apply(func), 'YES', '')
Result:
ID Admit_date Discharge_date Transfer
0 1 31/01/2022 01/02/2022
1 1 01/02/2022 02/02/2022 YES
2 1 07/02/2022 08/02/2022
3 2 02/02/2022 02/02/2022
4 3 02/02/2022 03/02/2022