How could I create the column called "reason", which shows which strings matched?
match_d = {"col_a":["green", "purple"], "col_b":["weak", "stro", "strong"],...}
df
fruit col_a col_b
0 apple yellow NaN
1 pear blue NaN
2 banana green strong
3 cherry green heavy
4 grapes brown light
...
Expected Output
fruit col_a col_b reason
0 apple yellow NaN NaN
1 pear blue NaN NaN
2 banana green strong col_a:["green"], col_b:["stro", "strong"]
3 cherry green heavy col_a:["green"]
4 grapes brown light
CodePudding user response:
Use nested list comprehension for join matched values by match_d
and then join values with columns names if not empty strings:
match_d = {"col_a":["green", "purple"], "col_b":["weak", "stro", "strong"]}
cols = list(match_d.keys())
L = [[','.join(z for z in match_d[x] if pd.notna(y) and z in y)
for y in df[x]] for x in df[cols]]
df['reason'] = [np.nan if ''.join(x) == '' else ';'.join(f'{a}:[{b}]'
for a, b in zip(cols, x) if b != '')
for x in zip(*L)]
print (df)
fruit col_a col_b reason
0 apple yellow NaN NaN
1 pear blue NaN NaN
2 banana green strong col_a:[green];col_b:[stro,strong]
3 cherry green heavy col_a:[green]
4 grapes brown light NaN
Alternative solution with .apply
:
match_d = {"col_a":["green", "purple"], "col_b":["weak", "stro", "strong"]}
cols = list(match_d.keys())
df1 = df[cols].apply(lambda x: [','.join(z for z in match_d[x.name]
if pd.notna(y) and z in y) for y in x])
df['reason'] = [np.nan if ''.join(x) == '' else ';'.join(f'{a}:[{b}]'
for a, b in zip(cols, x) if b != '')
for x in df1.to_numpy()]