I have a Pandas dataframe with multiple boolean columns that indicate properties like so:
df = pd.DataFrame(
{
"id": [0, 1, 2, 3],
"non_important_column": ["some", "random", "unneeded", "values"],
"property_A": [False, True, False, False],
"property_B": [False, False, False, True],
"property_C": [False, False, True, False],
}
)
id non_important_column property_A property_B property_C
0 0 some False False False
1 1 random True False False
2 2 unneeded False False True
3 3 values False True False
And a mapping dict like so:
mapping_dict = {False: "none", "property_A": "foo", "property_B": "bar", "property_C": "baz"}
I would like a new result
column to look like the following:
id non_important_column property_A property_B property_C result
0 0 some False False False none
1 1 random True False False foo
2 2 unneeded False False True baz
3 3 values False True False bar
Or, in other words,
- Create a new column called
result
- Look at each of the
property_*
columns - If all of the
property_*
columns for that row areFalse
, returnnone
as shown in themapping_dict
. - If one of the
property_*
columns for that row areTrue
(and only one will be), then return the value in themapping_dict
for the column name, which is a key inside of themapping_dict
.
I have already tried to use the approach similar to the question Melt multiple boolean columns in a single column in pandas, but this is slightly different. I got weird results from using idxmax(1).to_frame('result')
.
CodePudding user response:
If only one or no Trues values per rows use DataFrame.filter
with DataFrame.idxmax
, then use Series.map
and last replace only False
s rows by Series.where
:
df1 = df.filter(like='property_')
df['result'] = (df1.idxmax(axis=1).map(mapping_dict)
.where(df1.any(axis=1), mapping_dict[False]))
print (df)
id non_important_column property_A property_B property_C result
0 0 some False False False none
1 1 random True False False foo
2 2 unneeded False False True baz
3 3 values False True False bar
CodePudding user response:
res = []
for index, row in df[["property_A","property_B","property_C"]].iterrows():
if sum(row) == 1:
val = row[row == True].index
res.append(mapping_dict[val[0]])
else:
res.append(mapping_dict[False])
df['result'] = res
df
id non_important_column property_A property_B property_C result
0 0 some False False False none
1 1 random True False False foo
2 2 unneeded False False True baz
3 3 values False True False bar
If you have multiple True in a row you can handle by adding and elif case