I have a dataframe where each row represents a patient and several columns list medical diagnoses. A simplified version is given below. Some patients have empty diagnosis columns, depending on how many diagnoses they have recorded.
data = {'id': [1, 2, 3, 4], 'diag_1': ['stroke', 'stroke', 'cancer', 'heart disease'], 'diag_2': ['dementia', 'heart disease', 'copd', 'hypertension'], 'diag_3': ['hypertension', '', '', '']}
df = pd.DataFrame(data=data)
I have a list of diagnoses, which are inclusion criteria for a study:
diagnoses = ['stroke', 'heart disease']
I want to add a column to the dataframe with a True/False (or 0/1) which reflects if the patient has at least one of the diagnoses in the diagnoses list in any of the diagnosis dataframe columns.
CodePudding user response:
You can use
df['flag'] = df.filter(like='diag').isin(diagnoses).any(axis=1)
print(df)
id diag_1 diag_2 diag_3 flag
0 1 stroke dementia hypertension True
1 2 stroke heart disease True
2 3 cancer copd False
3 4 heart disease hypertension True