Home > Mobile >  Create a new category based on column values: Pandas
Create a new category based on column values: Pandas

Time:02-17

I have the following dataframe

df

ID  Col_1   Col_2  Col_3 
1     0       1     1  
2     1       0     0
3     1       1     1
4     1       1     0

I would like to check each column other than ID have 0 values. If they have write the column name under a new column Category. If there are more 0 values with the same row, dublicate the row with different category value. If there is no any 0 value in entire row then leave empty the value of Category column.

I would like to get the dataframe as follows.

ID  Col_1   Col_2  Col_3   Category
1     0       1     1         Col_1
2     1       0     0         Col_2
2     1       0     0         Col_3
3     1       1     1
4     1       1     0         Col_3

Can any one help on this?

CodePudding user response:

If need all columns filled by 0 values use matrix multiplication dot, then use DataFrame.explode with splitted values (performance in large df should be worse):

df['Category'] = (df.iloc[:, 1:].eq(0).dot(df.columns[1:]   ',')
                     .str.strip(',').str.split(','))
                 
df = df.explode('Category')
print (df)
   ID  Col_1  Col_2  Col_3 Category
0   1      0      1      1    Col_1
1   2      1      0      0    Col_2
1   2      1      0      0    Col_3
2   3      1      1      1         
3   4      1      1      0    Col_3

EDIT:

s = df.set_index('ID').stack()
df = (df.join(s[s.eq(0)].reset_index(level=1)['level_1'].rename('Category'), on='ID')
        .fillna({'Category':''}))
print (df)
   ID  Col_1  Col_2  Col_3 Category
0   1      0      1      1    Col_1
1   2      1      0      0    Col_2
1   2      1      0      0    Col_3
2   3      1      1      1         
3   4      1      1      0    Col_3

Or:

s = df.melt('ID').query('value == 0').set_index('ID')['variable']
df = df.join(s.rename('Category'), on='ID').fillna({'Category':''})
print (df)
   ID  Col_1  Col_2  Col_3 Category
0   1      0      1      1    Col_1
1   2      1      0      0    Col_2
1   2      1      0      0    Col_3
2   3      1      1      1         
3   4      1      1      0    Col_3

CodePudding user response:

One option is to mask the non-zero, stack to get rid of them, and join to add the new column, while duplicating the rows as expected:

s = (df.drop('ID', axis=1).mask(lambda d: d.ne(0)).stack()
       .reset_index(level=1)['level_1'].rename('Category'))

df2 = df.join(s)

output:

   ID  Col_1  Col_2  Col_3 Category
0   1      0      1      1    Col_1
1   2      1      0      0    Col_2
1   2      1      0      0    Col_3
2   3      1      1      1         
3   4      1      1      0    Col_3

CodePudding user response:

how about this:

df['category']=df[['col1','col2','col3']]
.apply(lambda row:row[row==0].index.values, axis=1)

Output:

      col1  col2  col3      category
ID                                
0      1     0     1        [col2]
1      0     0     1  [col1, col2]
2      1     1     0        [col3]

As you can see ID the index in this example, but it doesn't have to be since you can select which columns you are working with, and then of course you can run:

df.explode('category')
  • Related