Below is the DF,
df = pd.DataFrame({'cd1' : ['PFE1', 'PFE25', np.nan, np.nan],
'cd2' : [np.nan, 'PFE28', 'PFE23', 'PFE14'],
'cd3' : ['PFE15', 'PFE2', 'PFE83', np.nan],
'cd4' : ['PFE25', np.nan, 'PFE39', 'PFE47'],
'cd5' : [np.nan, 'PFE21', 'PFE53', 'PFE15']})
df
cd1 cd2 cd3 cd4 cd5
0 PFE1 NaN PFE15 PFE25 NaN
1 PFE25 PFE28 PFE2 NaN PFE21
2 NaN PFE23 PFE83 PFE39 PFE53
3 NaN PFE14 NaN PFE47 PFE15
There 2 tasks to perform which I have some issues
- Combine Columns. As you can see below even wile filling NaN with nothing, the ',' is adding (not needed)
df = df.fillna('')
df['combined'] = df.values.tolist()
df
Output
cd1 cd2 cd3 cd4 cd5 combined
PFE1 PFE15 PFE25 ['PFE1', , 'PFE15', 'PFE25', ]
PFE25 PFE28 PFE2 PFE21 ['PFE25', 'PFE28', 'PFE2', , 'PFE21']
PFE23 PFE83 PFE39 PFE53 [, 'PFE23', 'PFE83', 'PFE39', 'PFE53']
PFE14 PFE47 PFE15 [, 'PFE14', , 'PFE47', 'PFE15']
- Remove words from the combined column which are not in spec_list
spec_list = ['PFE15', 'PFE25']
Expected output
combined new_col
['PFE1', , 'PFE15', 'PFE25', ] ['PFE15', 'PFE25']
['PFE25', 'PFE28', 'PFE2', , 'PFE21'] ['PFE25']
[, 'PFE23', 'PFE83', 'PFE39', 'PFE53']
[, 'PFE14', , 'PFE47', 'PFE15'] ['PFE15']
CodePudding user response:
Try this using agg
with axis=1:
df['combined'] = df.agg(lambda x: list(x.dropna()), axis=1)
spec_list = ['PFE15', 'PFE25']
df['new_col'] = df.agg(lambda x: list(x[x.isin(spec_list)]), axis=1)
Output:
cd1 cd2 cd3 cd4 cd5 combined new_col
0 PFE1 NaN PFE15 PFE25 NaN [PFE1, PFE15, PFE25] [PFE15, PFE25]
1 PFE25 PFE28 PFE2 NaN PFE21 [PFE25, PFE28, PFE2, PFE21] [PFE25]
2 NaN PFE23 PFE83 PFE39 PFE53 [PFE23, PFE83, PFE39, PFE53] []
3 NaN PFE14 NaN PFE47 PFE15 [PFE14, PFE47, PFE15] [PFE15]