Home > Net >  How to check specific columns for values and assign weighted integer values when checking against va
How to check specific columns for values and assign weighted integer values when checking against va

Time:05-26

I have a dataset containing diagnosis columns (DIAGX1-DIAGX42) for patients and I need to create a variable that sums the values for these based on weights from an external index.

df_patients

patients = [('pat1', 'Z509', 'M33', 'M32', 'M315'),
         ('pat2', 'I099', 'I278', 'M05', 'F01'),
         ('pat3', 'N057', 'N057', 'N058', 'N057')]
labels = ['patient_num', 'DIAGX1', 'DIAGX2', 'DIAGX3', 'DIAGX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
df_patients

Input
patient_num DIAGX1  DIAGX2  DIAGX3  DIAGX4
pat1        Z509    M33     M32     M315
pat2        I099    I278    M05     F01
pat3        N057    N057    N058    N057

Output
patient_num DIAGX1  DIAGX2  DIAGX3  DIAGX4 Score
pat1        Z509    M33     M32     M315   1
pat2        I099    I278    M05     F01    6
pat3        N057    N057    N058    N057   0

external_index, where if a column from the dataset above contains a value in any of the below that the value would be added. Only one member contributes to a value been given, e.g a value of both F01, F02 both in dementia will only result in 2 being allocated for that record/patient, values are only added/summed if they occur across grouped indexes e.g. F01=2 and I099=1 sum to 3

  1. congestive_heart_failure = 2
  2. dementia = 2
  3. chronic_pulmonary_disease= 1
  4. rheumatologic_disease = 1
congestive_heart_failure = [
    "I099",
    "I255",
    "I420",
    "I425",
    "I426",
    "I427",
    "I428",
    "I429",
    "I43",
    "I50",
    "P290",
]
dementia = ["F01", "F02", "F03", "F051", "G30", "G311"]
chronic_pulmonary_disease = [
    "I278",
    "I279",
    "J40",
    "J41",
    "J42",
    "J43",
    "J44",
    "J45",
    "J47",
    "J60",
    "J61",
    "J62",
    "J63",
    "J64",
    "J65",
    "J66",
    "J67",
    "J684",
    "J701",
    "J703",
]
rheumatologic_disease = [
    "M05",
    "M06",
    "M315",
    "M32",
    "M33",
    "M34",
    "M351",
    "M353",
    "M360",
]

CodePudding user response:

Annotated code

idx = {
    'dementia': dementia,
    'rheumatologic_disease': rheumatologic_disease,
    'congestive_heart_failure': congestive_heart_failure,
    'chronic_pulmonary_disease': chronic_pulmonary_disease,
}
mapping = {v: k for k, vals in idx.items() for v in vals}

weights = {
    'dementia': 2,
    'rheumatologic_disease': 1,
    'congestive_heart_failure': 2,
    'chronic_pulmonary_disease': 1,
}

# Convert the dataframe into long format
df = df_patients.melt('patient_num')

# Substitute disease name inplace of codes
df['value'] = df['value'].map(mapping)

# Drop dupes per patient and disease
df = df.drop_duplicates(['patient_num', 'value'])

# Map the weights assigned to diseases
df['value'] = df['value'].map(weights)

# Sum the weights per patient and map it back to original dataframe
df_patients['Score'] = df['patient_num'].map(df.groupby('patient_num')['value'].sum())

Result

  patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4  Score
0        pat1   Z509    M33    M32   M315    1.0
1        pat2   I099   I278    M05    F01    6.0
2        pat3   N057   N057   N058   N057    0.0

CodePudding user response:

You can do it this way and probably a few other ways more efficiently:

chf_dict = dict(zip(congestive_heart_failure,['chf']*len(congestive_heart_failure)))
dementia_dict = dict(zip(dementia,['dem']*len(dementia)))
cpd_dict = dict(zip(chronic_pulmonary_disease,['cpd']*len(chronic_pulmonary_disease)))
rd_dict = dict(zip(rheumatologic_disease,['rd']*len(rheumatologic_disease)))
          
disease_map = chf_dict
disease_map.update(dementia_dict)
disease_map.update(cpd_dict)
disease_map.update(rd_dict)

score_dict = {'cpd':1, 
              'chf':2, 
              'rd':1, 
              'dem':2}

score_df = df_patients.set_index('patient_num').stack().map(disease_map)\
           .droplevel(1).reset_index(name='disease')\
           .drop_duplicates().set_index('patient_num')['disease']\
           .map(score_dict)\
           .groupby(level=0).sum().rename('Score')

df_patients.merge(score_df, left_on='patient_num', right_index=True)

Output:

  patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4  Score
0        pat1   Z509    M33    M32   M315    1.0
1        pat2   I099   I278    M05    F01    6.0
2        pat3   N057   N057   N058   N057    0.0
  • Related