I have 12 classifications, that contain multiple codes within (I show only 2 here in this example, dementia & solid tumour)
condition: codes
dementia: F01, F02, F03, F051, G30, G311
solid tumour: C77, C78, C79, C80
I want to be able to add a column for each of these 12 conditions and check whether a patient had any codes for a specific condition and if yes input 1, if no input 0 for that column.
patients = [('pat1', 'C77', 'F01', 'M32', 'M315'),
('pat2', 'I099', 'I278', 'M05', 'F01'),
('pat3', 'N057', 'N057', 'N058', 'N057')]
labels = ['patient_num', 'DIAGX1', 'DIAGX2', 'DIAGX3', 'DIAGX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
df_patients
Input
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4
pat1 C77 F01 M32 M315
pat2 I099 I278 M05 F01
pat3 N057 N057 N058 N057
Output
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4 dementia_yn tumour_yn
pat1 C77 F01 M32 M315 1 1
pat2 I099 I278 M05 F01 1 0
pat3 N057 N057 N058 N057 0 0
I have used code before np.select(conditions, values)
to create a single column based on conditions but would appreciate help in creating multiple columns dependant on conditions.
CodePudding user response:
You can store the conditions/codes in a dictionary, loop over that, and then use isin
any(axis=1)
to check if any codes from each condition are in each row of the dataframe:
all_codes = {
'dementia': ['F01', 'F02', 'F03', 'F051', 'G30', 'G311'],
'solid_tumour': ['C77', 'C78', 'C79', 'C80'],
}
for condition, codes in all_codes.items():
df[condition '_yn'] = df.isin(codes).any(axis=1).astype(int)
Output:
>>> df
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4 dementia_yn solid_tumour_yn
0 pat1 C77 F01 M32 M315 1 1
1 pat2 I099 I278 M05 F01 1 0
2 pat3 N057 N057 N058 N057 0 0