I have a df with with several columns which have only True/False values.
I want to create another column whose value will tell me which column has a True
value.
HEre's an example:
index | bol_1 | bol_2 | bol_3 | criteria |
---|---|---|---|---|
1 | True | False | False | bol_1 |
2 | False | True | False | bol_2 |
3 | True | True | False | [bol_1 , bol_2 ] |
My objective is to know which rows have True values(at least 1), and which columns are responsible for those True values. I want to be able to some basic statistics on this new column, e.g. for how many rows is bol_1 the unique column to have a True values.
CodePudding user response:
Use DataFrame.select_dtypes
for boolean columns, convert columns names to array and in list comprehension filter True
s values:
df1 = df.select_dtypes(bool)
cols = df1.columns.to_numpy()
df['criteria'] = [list(cols[x]) for x in df1.to_numpy()]
print (df)
bol_1 bol_2 bol_3 criteria
1 True False False [bol_1]
2 False True False [bol_2]
3 True True False [bol_1, bol_2]
If performance is not important use DataFrame.apply
:
df['criteria'] = df1.apply(lambda x: cols[x], axis=1)
CodePudding user response:
A possible solution:
df.assign(criteria=df.apply(lambda x: list(
df.columns[1:][x[1:] == True]), axis=1))
Output:
index bol_1 bol_2 bol_3 criteria
0 1 True False False [bol_1]
1 2 False True False [bol_2]
2 3 True True False [bol_1, bol_2]