Home > Software design >  Create a single categorical variable based on many dummy variables
Create a single categorical variable based on many dummy variables

Time:12-03

I have several category dummies that are mutually exclusive

id  cat1 cat2 cat3
A    0    0    1
B    1    0    0
C    1    0    0
D    0    0    1
E    0    1    0
F    0    0    1
..

I want to create a new column that contains all categories

id  cat1 cat2 cat3 type
A    0    0    1   cat3
B    1    0    0   cat1
C    1    0    0   cat1
D    0    0    1   cat3
E    0    1    0   cat2
F    0    0    1   cat3
..

CodePudding user response:

You can use pandas.from_dummies and filter to select the columns starting with "cat":

df['type'] = pd.from_dummies(df.filter(like='cat'))

Output:

  id  cat1  cat2  cat3  type
0  A     0     0     1  cat3
1  B     1     0     0  cat1
2  C     1     0     0  cat1
3  D     0     0     1  cat3
4  E     0     1     0  cat2
5  F     0     0     1  cat3

CodePudding user response:

Use DataFrame.dot with DataFrame.filter for column with cat substring, if multiple 1 per rows are separated by ,:

m = df.filter(like='cat').eq(1)
#all columns without first
#m = df.iloc[:, 1:].eq(1)
df['type'] = m.dot(m.columns   ',').str[:-1]
print (df)
  id  cat1  cat2  cat3  type
0  A     0     0     1  cat3
1  B     1     0     0  cat1
2  C     1     0     0  cat1
3  D     0     0     1  cat3
4  E     0     1     0  cat2
5  F     0     0     1  cat3
  • Related