I have one dataframe,
Index UX PDPre1
0 U3 ["H_U1", "H_U8", "Y_U2", "H_U3", "Y_U3_O"]
1 U4 ["H_U1", "H_U4", "Y_U2", "H_U3", "Y_U3"]
2 U2 ["H_U1", "H_U8", "Y_U2", "H_U3", "Y_U3"]
I would like to filter up the element in PDPre1 base on the key word in UX column. Expected Output:
Index UX PDPre1 Output
0 U3 ["H_U1", "H_U8", "Y_U2", "H_U3", "Y_U3_O"] ["H_U3", "Y_U3_O"]
1 U4 ["H_U1", "H_U4", "Y_U2", "H_U3", "Y_U3"] ["H_U4"]
2 U2 ["H_U1", "H_U8", "Y_U2", "H_U3", "Y_U3"] ["Y_U2"]
I have try below code but fail:
df["Output"]=list(filter(lambda x: df[df.UX] in x, df.PDPre1))
Please help, Thanks
CodePudding user response:
Use nested list comprehension for test if substring match in in
statement what should be more prefer like apply(axis=1)
solutions, because performance:
df["Output"] = [[x for x in b if a in x] for a, b in zip(df['UX'], df.PDPre1)]
print (df)
Index UX PDPre1 Output
0 0 U3 [H_U1, H_U8, Y_U2, H_U3, Y_U3_O] [H_U3, Y_U3_O]
1 1 U4 [H_U1, H_U4, Y_U2, H_U3, Y_U3] [H_U4]
2 2 U2 [H_U1, H_U8, Y_U2, H_U3, Y_U3] [Y_U2]
CodePudding user response:
We can do a simple for loop with zip
df['out'] = [[z for z in y if x in z] for x, y in zip(df['UX'],df['PDPre1'])]
Out[247]: [['H_U3', 'Y_U3_O'], ['H_U4'], ['Y_U2']]
CodePudding user response:
Assuming lists, you can use apply
:
df['PDPre1'] = df.apply(lambda r: [e for e in r['PDPre1'] if r['UX'] in e], axis=1)
or using a list comprehension for performance:
df['PDPre1'] = [[e for e in p if ux in p] for p, ux in zip(df['PDPre1'], df['UX'])]
output:
Index UX PDPre1
0 0 U3 [H_U3, Y_U3_O]
1 1 U4 [H_U4]
2 2 U2 [Y_U2]