I have a dataframe like this with Boolean values
black | yellow | orange |
---|---|---|
TRUE | TRUE | TRUE |
FALSE | TRUE | FALSE |
TRUE | TRUE | FALSE |
FALSE | FALSE | TRUE |
I want a separate column that summarizes the Boolean values based on column name which the column would be
summary |
---|
black, yellow, orange |
yellow |
black, yellow |
orange |
Any idea how to do this please? Thanks!
CodePudding user response:
You can use each row as a selection mask to filter the column names:
(
df.astype("bool")
.apply(lambda row: ", ".join(df.columns[row]), axis=1)
.to_frame("summary")
)
CodePudding user response:
Try this using pd.DataFrame.dot:
df_colors['summary'] = df_colors.dot(df_colors.columns ', ').str.strip(', ')
df_colors
Output:
black yellow orange summary
0 True True True black, yellow, orange
1 False True False yellow
2 True True False black, yellow
3 False False True orange