Python how filter the value in the dataframe cell based on others column?-CodePudding

I have one dataframe,

Index   UX    PDPre1                             
0       U3    ["H_U1", "H_U8", "Y_U2", "H_U3", "Y_U3_O"]
1       U4    ["H_U1", "H_U4", "Y_U2", "H_U3", "Y_U3"]
2       U2    ["H_U1", "H_U8", "Y_U2", "H_U3", "Y_U3"]

I would like to filter up the element in PDPre1 base on the key word in UX column. Expected Output:

Index   UX    PDPre1                                        Output
0       U3    ["H_U1", "H_U8", "Y_U2", "H_U3", "Y_U3_O"]    ["H_U3", "Y_U3_O"]
1       U4    ["H_U1", "H_U4", "Y_U2", "H_U3", "Y_U3"]      ["H_U4"]
2       U2    ["H_U1", "H_U8", "Y_U2", "H_U3", "Y_U3"]      ["Y_U2"]

I have try below code but fail:

df["Output"]=list(filter(lambda x: df[df.UX] in x, df.PDPre1))

Please help, Thanks

CodePudding user response：

Use nested list comprehension for test if substring match in in statement what should be more prefer like apply(axis=1) solutions, because performance:

df["Output"] = [[x for x in b if a in x] for a, b in zip(df['UX'], df.PDPre1)]
print (df)
   Index  UX                            PDPre1          Output
0      0  U3  [H_U1, H_U8, Y_U2, H_U3, Y_U3_O]  [H_U3, Y_U3_O]
1      1  U4    [H_U1, H_U4, Y_U2, H_U3, Y_U3]          [H_U4]
2      2  U2    [H_U1, H_U8, Y_U2, H_U3, Y_U3]          [Y_U2]

CodePudding user response：

We can do a simple for loop with zip

df['out'] = [[z for z in y if x in z] for x, y in zip(df['UX'],df['PDPre1'])]
Out[247]: [['H_U3', 'Y_U3_O'], ['H_U4'], ['Y_U2']]

CodePudding user response：

Assuming lists, you can use apply:

df['PDPre1'] = df.apply(lambda r: [e for e in r['PDPre1'] if r['UX'] in e], axis=1)

or using a list comprehension for performance:

df['PDPre1'] = [[e for e in p if ux in p] for p, ux in zip(df['PDPre1'], df['UX'])]

output:

   Index  UX          PDPre1
0      0  U3  [H_U3, Y_U3_O]
1      1  U4          [H_U4]
2      2  U2          [Y_U2]