Home > Mobile >  Pandas multiple boolean columns to new single column with mapping dict
Pandas multiple boolean columns to new single column with mapping dict

Time:10-11

I have a Pandas dataframe with multiple boolean columns that indicate properties like so:

df = pd.DataFrame(
    {
        "id": [0, 1, 2, 3],
        "non_important_column": ["some", "random", "unneeded", "values"],
        "property_A": [False, True, False, False],
        "property_B": [False, False, False, True],
        "property_C": [False, False, True, False],
    }
)

   id non_important_column  property_A  property_B  property_C
0   0                 some       False       False       False
1   1               random        True       False       False
2   2             unneeded       False       False        True
3   3               values       False        True       False

And a mapping dict like so:

mapping_dict = {False: "none", "property_A": "foo", "property_B": "bar", "property_C": "baz"}

I would like a new result column to look like the following:

   id non_important_column  property_A  property_B  property_C result
0   0                 some       False       False       False   none
1   1               random        True       False       False    foo
2   2             unneeded       False       False        True    baz
3   3               values       False        True       False    bar

Or, in other words,

  1. Create a new column called result
  2. Look at each of the property_* columns
  3. If all of the property_* columns for that row are False, return none as shown in the mapping_dict.
  4. If one of the property_* columns for that row are True (and only one will be), then return the value in the mapping_dict for the column name, which is a key inside of the mapping_dict.

I have already tried to use the approach similar to the question Melt multiple boolean columns in a single column in pandas, but this is slightly different. I got weird results from using idxmax(1).to_frame('result').

CodePudding user response:

If only one or no Trues values per rows use DataFrame.filter with DataFrame.idxmax, then use Series.map and last replace only Falses rows by Series.where:

df1 = df.filter(like='property_')
df['result'] = (df1.idxmax(axis=1).map(mapping_dict)
                    .where(df1.any(axis=1), mapping_dict[False]))
print (df)
   id non_important_column  property_A  property_B  property_C result
0   0                 some       False       False       False   none
1   1               random        True       False       False    foo
2   2             unneeded       False       False        True    baz
3   3               values       False        True       False    bar

CodePudding user response:

res = []
for index, row in df[["property_A","property_B","property_C"]].iterrows():
    if sum(row) == 1:
        val = row[row == True].index 
        res.append(mapping_dict[val[0]])
    else:
        res.append(mapping_dict[False])

df['result'] = res
df

    id  non_important_column    property_A  property_B  property_C  result
0   0   some     False  False   False   none
1   1   random   True   False   False   foo
2   2   unneeded False  False   True    baz
3   3   values   False  True    False   bar

If you have multiple True in a row you can handle by adding and elif case

  • Related