I have two dataframes, one has a column that contains a list of values and the other one has some values.
I want to filter the main df if one of the values in the second df exists in the main df column.
Code:
import pandas as pd
A = pd.DataFrame({'index':[0,1,2,3,4], 'vals':[[1,2],[5,4],[7,1,26],['-'],[9,8,5]]})
B = pd.DataFrame({'index':[4,7], 'val':[1,8]})
print(A)
print(B)
print(B['val'].isin(A['vals'])) # Will not work since its comparing element to list
result = pd.DataFrame({'index':[0,2,4], 'vals':[[1,2],[7,1,26],[9,8,5]]})
Dataframe A
index | vals |
---|---|
0 | [1, 2] |
1 | [5, 4] |
2 | [7, 1, 26] |
3 | [-] |
4 | [9, 8, 5] |
Dataframe B
index | val |
---|---|
4 | 1 |
7 | 8 |
Result
index | vals |
---|---|
0 | [1, 2] |
2 | [7, 1, 26] |
4 | [9, 8, 5] |
CodePudding user response:
You can explode your vals
column then compute the intersection:
>>> A.loc[A['vals'].explode().isin(B['val']).loc[lambda x: x].index]
index vals
0 0 [1, 2]
2 2 [7, 1, 26]
4 4 [9, 8, 5]
Detail about explode
:
>>> A['vals'].explode()
0 1
0 2
1 5
1 4
2 7 # not in B -|
2 1 # in B | -> keep index 2
2 26 # not in B -|
3 -
4 9
4 8
4 5
Name: vals, dtype: object
CodePudding user response:
You can use:
# mask the values based on the intersection between the list in each row and B values
mask = A['vals'].apply(lambda a: len(list(set(a) & set(B['val'])))) > 0
result = A[mask]
print(result)
Output:
index vals
0 0 [1, 2]
2 2 [7, 1, 26]
4 4 [9, 8, 5]