Home > Enterprise >  Check if any value in a list exists in a group of dataframe columns and create new boolean column
Check if any value in a list exists in a group of dataframe columns and create new boolean column

Time:07-22

I have a dataframe where each row represents a patient and several columns list medical diagnoses. A simplified version is given below. Some patients have empty diagnosis columns, depending on how many diagnoses they have recorded.

data = {'id': [1, 2, 3, 4], 'diag_1': ['stroke', 'stroke', 'cancer', 'heart disease'], 'diag_2': ['dementia', 'heart disease', 'copd', 'hypertension'], 'diag_3': ['hypertension', '', '', '']}
df = pd.DataFrame(data=data)

I have a list of diagnoses, which are inclusion criteria for a study:

diagnoses = ['stroke', 'heart disease']

I want to add a column to the dataframe with a True/False (or 0/1) which reflects if the patient has at least one of the diagnoses in the diagnoses list in any of the diagnosis dataframe columns.

CodePudding user response:

You can use

df['flag'] = df.filter(like='diag').isin(diagnoses).any(axis=1)
print(df)

   id         diag_1         diag_2        diag_3   flag
0   1         stroke       dementia  hypertension   True
1   2         stroke  heart disease                 True
2   3         cancer           copd                False
3   4  heart disease   hypertension                 True
  • Related