Home > Mobile >  Comparing two different column values for the records having same primary key value
Comparing two different column values for the records having same primary key value

Time:02-14

I want to compare the two columns (Admit_date and Discharge_date) for same ID and set the value of column Transfer to 'Yes' if admit_date matches with the previous Discharge_date.

INPUT:

ID Admit_Date Discharge_date Transfer
1 31 Jan 2022 1 Feb 2022
1 1 Feb 2022 2 Feb 2022
1 7 Feb 2022 8 Feb 2022
2 2 Feb 2022 2 Feb 2022
3 2 Feb 2022 3 Feb 2022

Input in CSV format:

ID,Admit_date,Discharge_date,Transfer
1,31/01/2022,01/02/2022, 
1,01/02/2022,02/02/2022, 
1,07/02/2022,08/02/2022,
2,02/02/2022,02/02/2022,
3,02/02/2022,03/02/2022,

Output:

ID Admit_Date Discharge_date Transfer
1 31 Jan 2022 1 Feb 2022
1 1 Feb 2022 2 Feb 2022 YES
1 7 Feb 2022 8 Feb 2022
2 2 Feb 2022 2 Feb 2022
3 2 Feb 2022 3 Feb 2022

In CSV format:

ID,Admit_date,Discharge_date,Transfer
1,31/01/2022,01/02/2022, 
1,01/02/2022,02/02/2022, YES
1,07/02/2022,08/02/2022,
2,02/02/2022,02/02/2022,
3,02/02/2022,03/02/2022,

CodePudding user response:

You can use groupby and apply to apply a custom function to each group. The function then needs to compare the Admit_date with the previous row's Discharge_date which can be done using shift. This will give a boolean value (True or False), to get a string value you can use np.where. For example:

import numpy as np

def func(x):
    return x['Admit_date'] == x['Discharge_date'].shift(1)

df['Transfer'] = np.where(df.groupby('ID').apply(func), 'YES', '')

Result:

   ID  Admit_date  Discharge_date  Transfer
0   1  31/01/2022      01/02/2022   
1   1  01/02/2022      02/02/2022       YES
2   1  07/02/2022      08/02/2022   
3   2  02/02/2022      02/02/2022   
4   3  02/02/2022      03/02/2022   
  • Related