Home > database >  Create column of lists
Create column of lists

Time:10-20

Is there a direct way to create col_list, which specifies when only fruit and clothes are not na?

df   fruit   clothes  trending   ...
0    apple   shirt     no
1    np.NaN  trouser   yes
2    np.NaN  np.NaN    yes  
df   fruit   clothes     trending ... col_list
0    apple   shirt       no           ["fruit", "clothes"]
1    np.NaN  trouser     yes          ["clothes"]
2    np.NaN  np.NaN      yes          np.NaN

CodePudding user response:

Use list comprehension:

L= ['fruit','clothes']

cols = np.array(L)
df['col_list'] = [cols[x].tolist() if x.any() else np.nan for x in df[L].notna().to_numpy()]

print (df)
   fruit  clothes trending          col_list
0  apple    shirt       no  [fruit, clothes]
1    NaN  trouser      yes         [clothes]
2    NaN      NaN      yes               NaN

Or solution with apply:

cols = np.array(L)
f = lambda x: cols[x].tolist() if x.any() else np.nan
df['col_list'] = df[L].notna().apply(f, axis=1)

Or solution with DataFrame.dot:

L= ['fruit','clothes']

df['col_list'] = (df[L].notna().dot(pd.Index(L)   ',')
                       .str.strip(',')
                       .replace('', np.nan)
                       .str.split(','))

print (df)
   fruit  clothes trending          col_list
0  apple    shirt       no  [fruit, clothes]
1    NaN  trouser      yes         [clothes]
2    NaN      NaN      yes               NaN

CodePudding user response:

IIUC, you can use:

col_list=['fruit', 'clothes']

df['col_list'] = (df[col_list].stack()
                  .reset_index(1)['level_1']
                  .groupby(level=0).agg(list)
                 )

output:

   fruit  clothes trending          col_list
0  apple    shirt       no  [fruit, clothes]
1    NaN  trouser      yes         [clothes]
2    NaN      NaN      yes               NaN
  • Related