Get list of columns names with cells satisfying a condition and store in separate column for each ro-CodePudding

I have a df with structure similar to below data with more cols and more rows. How to get the expected result. My code has an error - I interpret that the list cannot be saved in df cell? How to avoid loops if possible?

data = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [1, 0, 1], [0, 0, 0]]

df = pd.DataFrame(data, columns=["choice_a", "choice_b", "choice_c"])

Expected result

       choice_a  choice_b    choice_c  choices
index                  
0        0       0           1         ['c']
1        0       1           0         ['b']
2        1       0           0         ['a']
3        1.      0.          1         ['a','b']
4        0.      0.          0         NA

My code

df['choices']=0
for i in np.arange(df.shape[0]):
    choice_list = []
    for j in np.arange(len(df.columns)):
        if df.iloc[i,j]==1:
            choice_list.append(df.columns[j].split('_')[1])
    df.iloc[i,4]=choice_list

Error I am getting

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/65/3mqr9fpn37jf2xt2pxbcgp_w0000gn/T/ipykernel_1513/2279334138.py in <module>
      5         if main_dataset.iloc[i,j]==1:
      6             choice_list.append(main_dataset.columns[j].split('_')[1])
----> 7     main_dataset.iloc[i,5]=choice_list

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    714 
    715         iloc = self if self.name == "iloc" else self.obj.iloc
--> 716         iloc._setitem_with_indexer(indexer, value, self.name)
    717 
    718     def _validate_key(self, key, axis: int):

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value, name)
   1689         if take_split_path:
   1690             # We have to operate column-wise
-> 1691             self._setitem_with_indexer_split_path(indexer, value, name)
   1692         else:
   1693             self._setitem_single_block(indexer, value, name)

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _setitem_with_indexer_split_path(self, indexer, value, name)
   1744                     return self._setitem_with_indexer((pi, info_axis[0]), value[0])
   1745 
-> 1746                 raise ValueError(
   1747                     "Must have equal len keys and value "
   1748                     "when setting with an iterable"

ValueError: Must have equal len keys and value when setting with an iterable

CodePudding user response：

Use list comprehension with filter columns names splitted by _ for new column, last if necessary replace empty lists to missing values add Series.where:

cols = df.columns.str.split('_').str[1].to_numpy()
df['choices'] = [list(cols[x == 1]) for x in df.to_numpy()]

df['choices'] = df['choices'].where(df['choices'].astype(bool))
print (df)
   choice_a  choice_b  choice_c choices
0         0         0         1     [c]
1         0         1         0     [b]
2         1         0         0     [a]
3         1         0         1  [a, c]
4         0         0         0     NaN

CodePudding user response：

Another possible solution:

df['choices'] = df.apply(lambda x: df.columns[x == 1].str.replace(
    r'^.*_', '', regex=True).values if x.any() else np.nan, axis=1)

Output:

   choice_a  choice_b  choice_c choices
0         0         0         1     [c]
1         0         1         0     [b]
2         1         0         0     [a]
3         1         0         1  [a, c]
4         0         0         0     NaN