Home > Net >  Combine multiple columns into one category column using the column names as value label
Combine multiple columns into one category column using the column names as value label

Time:09-08

I have this data

   ID      A      B      C
0   0   True  False  False
1   1  False   True  False
2   2  False  False   True

And want to transform it into

   ID group
0   0     A
1   1     B
2   2     C
  • I want to use the column names as value labels for the category column.
  • There is maximum only one True value per row.

This is the MWE

#!/usr/bin/env python3
import pandas as pd

df = pd.DataFrame({
    'ID': range(3),
    'A': [True, False, False],
    'B': [False, True, False],
    'C': [False, False, True]
})

result = pd.DataFrame({
    'ID': range(3),
    'group': ['A', 'B', 'C']
})
result.group = result.group.astype('category')

print(df)
print(result)

I could do df.apply(lambda row: ...magic.., axis=1). But isn't there a more elegant way with pandas own tools?

CodePudding user response:

Yet another way:

df.set_index(['ID'])\
  .rename_axis('group', axis=1)\ # getting column name correct
  .stack()\                      # reshaping getting column headers into dataframe rows
  .loc[lambda x: x]\             # filtering for True
  .reset_index()\                # moving ID back into dataframe columns
  .drop(0, axis=1)               # dropping boolean column

Output:

   ID group
0   0     A
1   1     B
2   2     C

CodePudding user response:

You can use df.dot:

df['group'] = df[['A', 'B', 'C']].dot(df.columns[1:])

CodePudding user response:

You could use pd.melt() to reshape and rename, and then drop the False rows in your 'value' column using query which allows for a chained operation:

pd.melt(df,id_vars=['ID'],var_name= 'group').query('value') \
    .drop('value',axis=1).reset_index(drop=True)

   ID group
0   0     A
1   1     B
2   2     C

CodePudding user response:

You can use melt then a lookup based on the column where the values are true to get the results you are expecting

df = df.melt(id_vars = 'ID', var_name = 'group')
df.loc[df['value'] == True][['ID', 'group']]

CodePudding user response:

More verbose than melt, but this drops the invalid columns during the reshaping:

(df.set_index('ID')
   .rename_axis(columns='group')
   .replace(False, pd.NA)
   .stack().reset_index().drop(columns=0)
)

output:

   ID group
0   0     A
1   1     B
2   2     C

CodePudding user response:

Try with apply lambda

df.set_index('ID').apply(lambda x : x.index[x][0],axis=1)
Out[39]: 
ID
0    A
1    B
2    C
dtype: object

CodePudding user response:

idxmax

s = df.set_index('ID')
s.idxmax(1).where(s.any(1))

ID
0    A
1    B
2    C
dtype: object
  • Related