How to create a new column based on binary values of others columns-CodePudding

I have a dataset like the df below:

Person   atrib1  atrib2  atrib3   atrib4
Paulo      0      1       0         1
Andres     1      1       0         1

I want a to create a new column atrib_list with a list of the atrib where the value it's value is 1 like this output:

Person   atrib1  atrib2  atrib3   atrib4     atrib_list
Paulo      0      1       0         1        ['atrib2',atrib4']
Andres     1      1       0         1        ['atrib1','atrib2','atrib4]

I am trying something like this:

df['atrib_list'] = df.apply(lambda x: x for x in df.columns if df.value==1)

but it's completly wrong

CodePudding user response：

Fix your output

df['new'] = df.apply(lambda y: [z  for x,z in zip(y,y.index) if x==1] ,axis=1)
Out[308]: 
0            [atrib2, atrib4]
1    [atrib1, atrib2, atrib4]
dtype: object

CodePudding user response：

You can use a stack:

df['atrib_list'] = (df
   .filter(like='atrib').replace(0, pd.NA)
   .stack().reset_index(1)
   .groupby(level=0)['level_1'].agg(list)
)

Other idea using itertools.compress:

from itertools import compress

cols = list(df.filter(like='atrib'))
df['atrib_list'] = df[cols].apply(lambda x: list(compress(cols, x)), axis=1)

output:

   Person  atrib1  atrib2  atrib3  atrib4                atrib_list
0   Paulo       0       1       0       1          [atrib2, atrib4]
1  Andres       1       1       0       1  [atrib1, atrib2, atrib4]