Home > OS >  How do i use medical codes to determine what disease a person have using jupyter?
How do i use medical codes to determine what disease a person have using jupyter?

Time:03-28

I'm currently trying to use a number of medical codes to find out if a person has a certain disease and would require help as I tried searching for a couple of days but couldn't find any. Hoping someone can help me with this. Considering I've imported excel file 1 into df1 and excel file 2 into df2, how do I use excel file 2 to identify what disease does the patients in excel file 1 have and indicate them with a header? Below is an example of what the data looks like. I'm currently using pandas Jupyter notebook for this.

Excel file 1:

Patient Primary Diagnosis Secondary Diagnosis Secondary Diagnosis 2 Secondary Diagnosis 3
Alex 50322 50111
John 50331 60874 50226 74444
Peter 50226 74444
Peter 50233 88888

Excel File 2:

Primary Diagnosis Medical Code
Diabetes Type 2 50322
Diabetes Type 2 50331
Diabetes Type 2 50233
Cardiovescular Disease 50226
Hypertension 50111
AIDS 60874
HIV 74444
HIV 88888

Intended output:

Patient Positive for Diabetes Type 2 Positive for Cardiovascular Disease Positive for Hypertension Positive for AIDS Positive for HIV
Alex 1 1 0 0 0
John 1 1 0 1 1
Peter 1 1 0 0 1

CodePudding user response:

IIUC, you could melt df1, then map the codes from reshaped df2, finally pivot_table on the output:

diseases = df2.set_index('Medical Code')['Primary Diagnosis']

(df1
 .reset_index()
 .melt(id_vars=['index', 'Patient'])
 .assign(disease=lambda d: d['value'].map(diseases),
         value=1,
        )
 .pivot_table(index='Patient', columns='disease', values='value', fill_value=0)
)

output:

disease  AIDS  Cardiovescular Disease  Diabetes Type 2  HIV  Hypertension
Patient                                                                  
Alex        0                       0                1    0             1
John        1                       1                1    1             0
Peter       0                       1                1    1             0

CodePudding user response:

Maybe you could convert your excel file 2 to some form of key value pair and then replace the primary diagnostics column in file 1 with the corresponding disease name, later apply some form of encoding like one-hot or something similar to file 1. Not sure if this approach would definitely help, but just sharing my thoughts.

CodePudding user response:

You can use merge and pivot_table

out = (
    df1.melt('Patient', var_name='Diagnosis', value_name='Medical Code').dropna()
       .merge(df2, on='Medical Code').assign(dummy=1)
       .pivot_table('dummy', 'Patient', 'Primary Diagnosis', fill_value=0)
       .add_prefix('Positive for ').rename_axis(columns=None).reset_index()
)

Output:

Patient Positive for AIDS Positive for Cardiovescular Disease Positive for Diabetes Type 2 Positive for HIV Positive for Hypertension
Alex 0 0 1 0 1
John 1 1 1 1 0
Peter 0 1 1 1 0
  • Related