How to check conditions for paired variables-CodePudding

I have a dataset, with patient records that have either had consults or not indicated by 7 variables (FITM1-7, I will only showcase 2, FITM1, FITM2) having a particular code (I provide a list of codes, msp_codes).

My conditions are to check each record for the occurrence of the codes and then check whether the associated date of which consult took place (SERVDT1-7) was between the admission date (ADMIT_DATE) and surgery date (SURG_DATE), if so I want to create an additional variable med_consult = 1 or med_consult = 0

My issue is how to check for these pair of variables for the conditions (FITM1-7, SERVDT1-7) and ignore the irrelevant variable (CLMSPEC1-7) that comes between events, I have again used np.select to do something similar but found that I was writing a lot of repetitive code to account for all conditions but would rather a better way of maybe melting melt() the dataset and doing it this way.

msp_codes = [310, 311, 0305, 308, 910]

# Input
patients = [
    ("1", '2022-11-13', '2022-11-15', "310", "2022-11-14", "M3200", "00456", "2022-11-14", "G456"),
    ("2", '2022-11-10', '2022-11-22', "007", "2022-11-14", "M05", "0033", "2022-11-20", "G456"),
    ("3", '2022-11-09', '2022-11-13', "309", "2022-10-14", "N058", "308", "2021-10-02", "F70"),
]
labels = ["ID", 'ADMIT_DATE', 'SURG_DATE', "FITM1", "SERVDT1", "CLMSPEC1", "FITM2", "SERVDT2", "CLMSPEC2"]
df_patients = pd.DataFrame.from_records(patients, columns=labels)

date_cols = ['ADMIT_DATE', 'SURG_DATE', 'SERVDT1', 'SERVDT2']

df_patients[date_cols] = df_patients[date_cols].apply(pd.to_datetime, errors='coerce')

df_patients

ID  ADMIT_DATE  SURG_DATE   FITM1   SERVDT1     CLMSPEC1  FITM2  SERVDT2      CLMSPEC2
1   2022-11-13  2022-11-15  310     2022-11-14  M3200     00456  2022-11-14   G456
2   2022-11-10  2022-11-22  007     2022-11-14  M05       0033   2022-11-20   G456
3   2022-11-09  2022-11-13  309     2022-10-14  N058      308    2021-10-02   F70


# Output
ID ADMIT_DATE SURG_DATE  FITM1 SERVDT1    CLMSPEC1 FITM2  SERVDT2    CLMSPEC2 med_consult
1  2022-11-13 2022-11-15 310   2022-11-14 M3200    00456  2022-11-14 G456     1
2  2022-11-10 2022-11-22 007   2022-11-14 M05      0033   2022-11-20 G456     0
3  2022-11-09 2022-11-13 309   2022-10-14 N058     308    2021-10-02 F70      0

Row 1 meets both the condition of having a MSP code in the list 310 and a associated date within the admission date and surgery date.
Row 2 doesn't meet the criteria of having an associated MSP code, so dates aren't checked.
Row 3 has an associated MSP code but the date associated doesn't fall within admission date and surgery date.

CodePudding user response：

Setup, note how I've properly formatted msp_codes and modified how the date_cols are made:

msp_codes = ['310', '311', '0305', '308', '910']

patients = [
    ("1", '2022-11-13', '2022-11-15', "310", "2022-11-14", "M3200", "00456", "2022-11-14", "G456"),
    ("2", '2022-11-10', '2022-11-22', "007", "2022-11-14", "M05", "0033", "2022-11-20", "G456"),
    ("3", '2022-11-09', '2022-11-13', "309", "2022-10-14", "N058", "308", "2021-10-02", "F70"),
]
labels = ["ID", 'ADMIT_DATE', 'SURG_DATE', "FITM1", "SERVDT1", "CLMSPEC1", "FITM2", "SERVDT2", "CLMSPEC2"]
df_patients = pd.DataFrame.from_records(patients, columns=labels)

date_cols = ['ADMIT_DATE', 'SURG_DATE', 'SERVDT1', 'SERVDT2']

for col in date_cols:
    df_patients[col] = pd.to_datetime(df_patients[col])

Doing:

mask = pd.DataFrame()
for servdt, fitm in zip(df_patients.filter(like='SERVDT'), df_patients.filter(like='FITM')):
    sub_mask = (df_patients[servdt].between(df_patients.ADMIT_DATE, df_patients.SURG_DATE) 
              & df_patients[fitm].isin(msp_codes))
    mask = pd.concat([mask, sub_mask], axis=1)

df_patients['med_consult'] = 0
df_patients.loc[mask.any(axis=1), 'med_consult'] = 1

Output:

  ID ADMIT_DATE  SURG_DATE FITM1    SERVDT1 CLMSPEC1  FITM2    SERVDT2 CLMSPEC2  med_consult
0  1 2022-11-13 2022-11-15   310 2022-11-14    M3200  00456 2022-11-14     G456            1
1  2 2022-11-10 2022-11-22   007 2022-11-14      M05   0033 2022-11-20     G456            0
2  3 2022-11-09 2022-11-13   309 2022-10-14     N058    308 2021-10-02      F70            0

Alternative method using pd.wide_to_long:

df = pd.wide_to_long(df_patients, stubnames=['FITM', 'SERVDT', 'CLMSPEC'], i=['ID', 'ADMIT_DATE', 'SURG_DATE'], j='med_consult').reset_index()
df = df[df.FITM.isin(msp_codes) & df.SERVDT.between(df.ADMIT_DATE, df.SURG_DATE)]
df = df.groupby(['ID'],as_index=False)['med_consult'].first()
df.med_consult = 1
df_patients = df_patients.merge(df, 'left').fillna(0)

Output:

  ID ADMIT_DATE  SURG_DATE FITM1    SERVDT1 CLMSPEC1  FITM2    SERVDT2 CLMSPEC2  med_consult
0  1 2022-11-13 2022-11-15   310 2022-11-14    M3200  00456 2022-11-14     G456          1.0
1  2 2022-11-10 2022-11-22   007 2022-11-14      M05   0033 2022-11-20     G456          0.0
2  3 2022-11-09 2022-11-13   309 2022-10-14     N058    308 2021-10-02      F70          0.0