I have the following dataframe
df
ID Col_1 Col_2 Col_3
1 0 1 1
2 1 0 0
3 1 1 1
4 1 1 0
I would like to check each column other than ID have 0 values. If they have write the column name under a new column Category
. If there are more 0 values with the same row, dublicate the row with different category value. If there is no any 0 value in entire row then leave empty the value of Category
column.
I would like to get the dataframe as follows.
ID Col_1 Col_2 Col_3 Category
1 0 1 1 Col_1
2 1 0 0 Col_2
2 1 0 0 Col_3
3 1 1 1
4 1 1 0 Col_3
Can any one help on this?
CodePudding user response:
If need all columns filled by 0
values use matrix multiplication dot
, then use DataFrame.explode
with splitted values (performance in large df should be worse):
df['Category'] = (df.iloc[:, 1:].eq(0).dot(df.columns[1:] ',')
.str.strip(',').str.split(','))
df = df.explode('Category')
print (df)
ID Col_1 Col_2 Col_3 Category
0 1 0 1 1 Col_1
1 2 1 0 0 Col_2
1 2 1 0 0 Col_3
2 3 1 1 1
3 4 1 1 0 Col_3
EDIT:
s = df.set_index('ID').stack()
df = (df.join(s[s.eq(0)].reset_index(level=1)['level_1'].rename('Category'), on='ID')
.fillna({'Category':''}))
print (df)
ID Col_1 Col_2 Col_3 Category
0 1 0 1 1 Col_1
1 2 1 0 0 Col_2
1 2 1 0 0 Col_3
2 3 1 1 1
3 4 1 1 0 Col_3
Or:
s = df.melt('ID').query('value == 0').set_index('ID')['variable']
df = df.join(s.rename('Category'), on='ID').fillna({'Category':''})
print (df)
ID Col_1 Col_2 Col_3 Category
0 1 0 1 1 Col_1
1 2 1 0 0 Col_2
1 2 1 0 0 Col_3
2 3 1 1 1
3 4 1 1 0 Col_3
CodePudding user response:
One option is to mask
the non-zero, stack
to get rid of them, and join
to add the new column, while duplicating the rows as expected:
s = (df.drop('ID', axis=1).mask(lambda d: d.ne(0)).stack()
.reset_index(level=1)['level_1'].rename('Category'))
df2 = df.join(s)
output:
ID Col_1 Col_2 Col_3 Category
0 1 0 1 1 Col_1
1 2 1 0 0 Col_2
1 2 1 0 0 Col_3
2 3 1 1 1
3 4 1 1 0 Col_3
CodePudding user response:
how about this:
df['category']=df[['col1','col2','col3']]
.apply(lambda row:row[row==0].index.values, axis=1)
Output:
col1 col2 col3 category
ID
0 1 0 1 [col2]
1 0 0 1 [col1, col2]
2 1 1 0 [col3]
As you can see ID the index in this example, but it doesn't have to be since you can select which columns you are working with, and then of course you can run:
df.explode('category')