Home > Software engineering >  From a set of columns with true/false values, say which column has a True value
From a set of columns with true/false values, say which column has a True value

Time:10-31

I have a df with with several columns which have only True/False values. I want to create another column whose value will tell me which column has a True value. HEre's an example:

index bol_1 bol_2 bol_3 criteria
1 True False False bol_1
2 False True False bol_2
3 True True False [bol_1, bol_2]

My objective is to know which rows have True values(at least 1), and which columns are responsible for those True values. I want to be able to some basic statistics on this new column, e.g. for how many rows is bol_1 the unique column to have a True values.

CodePudding user response:

Use DataFrame.select_dtypes for boolean columns, convert columns names to array and in list comprehension filter Trues values:

df1 = df.select_dtypes(bool)
cols = df1.columns.to_numpy()

df['criteria'] = [list(cols[x]) for x in df1.to_numpy()]

print (df)
   bol_1  bol_2  bol_3        criteria
1   True  False  False         [bol_1]
2  False   True  False         [bol_2]
3   True   True  False  [bol_1, bol_2]

If performance is not important use DataFrame.apply:

df['criteria'] = df1.apply(lambda x: cols[x], axis=1)

CodePudding user response:

A possible solution:

df.assign(criteria=df.apply(lambda x: list(
    df.columns[1:][x[1:] == True]), axis=1))

Output:

   index  bol_1  bol_2  bol_3        criteria
0      1   True  False  False         [bol_1]
1      2  False   True  False         [bol_2]
2      3   True   True  False  [bol_1, bol_2]
  • Related