Home > Net >  How to check if elements in a Pandas Series of lists are all part of another list?
How to check if elements in a Pandas Series of lists are all part of another list?

Time:09-16

I have a Pandas Series of lists of arbitary length:

s = pd.Series([[1,2,3], [4,6], [7,8,9,10]])

and a list of elements

l = [1,2,3,6,7,8]

I want to return all elements of the series s which has all values contained in l, otherwise None. I want to do something like this but apply it to each element in the series:

s.where(s.isin(l), None)

So the output would be a series:

pd.Series([[1,2,3], None, None])

CodePudding user response:

You can use the magic of python sets:

s.apply(set(l).issuperset)

Output:

0     True
1    False
2    False
dtype: bool

Then use where to modify the non matching rows using the previous output as mask:

s.where(s.apply(set(l).issuperset), None)

Output:

0    [1, 2, 3]
1         None
2         None
dtype: object

CodePudding user response:

you can explode the series, use isin with l and use all with the parameter level=0 (equivalent to groupby.all on the index).

print(s.explode().isin(l).all(level=0))
0     True
1    False
2    False
dtype: bool

use this Boolean mask in where to get your expected result

s1 = s.where(s.explode().isin(l).all(level=0), None)
print(s1)
0    [1, 2, 3]
1         None
2         None
dtype: object

Thanks to a comment of @mozway, the parameter level=0 in all is being deprecated, so the solution would be with groupby.all

s1 = s.where(s.explode().isin(l).groupby(level=0).all(), None)

CodePudding user response:

@TomNash, you can combine all function with listcomprehension:

s = pd.Series([[1,2,3], [4,5,6], [7,8,9]])

l = [1,2,3,6,7,8]

final_list = []
for x in s:
    if all(item in l for item in x):
        final_list.append(x)
    else:
        final_list.append(None)

print(final_list)

OUTPUT:

[[1, 2, 3], None, None]

CodePudding user response:

s = pd.Series([[1,2,3], [4,6], [7,8,9,10]])
l = [1,2,3,6,7,8]
new_series = []
for i in range(len(s)):
    s_in_l = 0
    for j in range(len(s[i])):
        if s[i][j] not in l:
            s_in_l = s_in_l   1
    if s_in_l == 0:
        new_series.append(s[i])
    else:
        new_series.append(None)
new_series = pd.Series(new_series)
print(new_series)

output:

0    [1, 2, 3]
1         None
2         None
dtype: object

CodePudding user response:

You can check the element of s is subset of l by .issubset function, as folllows:

s.apply(lambda x: x if set(x).issubset(l) else None)

or make use of numpy function setdiff1d, as follows:

s.apply(lambda x: x if (len(np.setdiff1d(x, l)) == 0) else None)

Result:

0    [1, 2, 3]
1         None
2         None
dtype: object
  • Related