Filter dataframe if value of another dataframe column exists in column of list

I have two dataframes, one has a column that contains a list of values and the other one has some values.

I want to filter the main df if one of the values in the second df exists in the main df column.

Code:

import pandas as pd

A = pd.DataFrame({'index':[0,1,2,3,4], 'vals':[[1,2],[5,4],[7,1,26],['-'],[9,8,5]]})
B = pd.DataFrame({'index':[4,7], 'val':[1,8]})

print(A)
print(B)
print(B['val'].isin(A['vals']))  # Will not work since its comparing element to list
result = pd.DataFrame({'index':[0,2,4], 'vals':[[1,2],[7,1,26],[9,8,5]]})

Dataframe A

index	vals
0	[1, 2]
1	[5, 4]
2	[7, 1, 26]
3	[-]
4	[9, 8, 5]

Dataframe B

index	val
4	1
7	8

Result

index	vals
0	[1, 2]
2	[7, 1, 26]
4	[9, 8, 5]

CodePudding user response：

You can explode your vals column then compute the intersection:

>>> A.loc[A['vals'].explode().isin(B['val']).loc[lambda x: x].index]
   index        vals
0      0      [1, 2]
2      2  [7, 1, 26]
4      4   [9, 8, 5]

Detail about explode:

>>> A['vals'].explode()
0     1
0     2
1     5
1     4
2     7  # not in B  -|
2     1  # in B       | -> keep index 2
2    26  # not in B  -|
3     -
4     9
4     8
4     5
Name: vals, dtype: object

CodePudding user response：

You can use:

# mask the values based on the intersection between the list in each row and B values
mask = A['vals'].apply(lambda a: len(list(set(a) & set(B['val'])))) > 0

result = A[mask]

print(result)

Output:

index   vals
0   0   [1, 2]
2   2   [7, 1, 26]
4   4   [9, 8, 5]