Is there a direct way to create col_list
, which specifies when only fruit and clothes are not na?
df fruit clothes trending ...
0 apple shirt no
1 np.NaN trouser yes
2 np.NaN np.NaN yes
df fruit clothes trending ... col_list
0 apple shirt no ["fruit", "clothes"]
1 np.NaN trouser yes ["clothes"]
2 np.NaN np.NaN yes np.NaN
CodePudding user response:
Use list comprehension:
L= ['fruit','clothes']
cols = np.array(L)
df['col_list'] = [cols[x].tolist() if x.any() else np.nan for x in df[L].notna().to_numpy()]
print (df)
fruit clothes trending col_list
0 apple shirt no [fruit, clothes]
1 NaN trouser yes [clothes]
2 NaN NaN yes NaN
Or solution with apply:
cols = np.array(L)
f = lambda x: cols[x].tolist() if x.any() else np.nan
df['col_list'] = df[L].notna().apply(f, axis=1)
Or solution with DataFrame.dot
:
L= ['fruit','clothes']
df['col_list'] = (df[L].notna().dot(pd.Index(L) ',')
.str.strip(',')
.replace('', np.nan)
.str.split(','))
print (df)
fruit clothes trending col_list
0 apple shirt no [fruit, clothes]
1 NaN trouser yes [clothes]
2 NaN NaN yes NaN
CodePudding user response:
IIUC, you can use:
col_list=['fruit', 'clothes']
df['col_list'] = (df[col_list].stack()
.reset_index(1)['level_1']
.groupby(level=0).agg(list)
)
output:
fruit clothes trending col_list
0 apple shirt no [fruit, clothes]
1 NaN trouser yes [clothes]
2 NaN NaN yes NaN