Return True/False if Series of Lists contains all elements of Reference List-CodePudding

I have a series of lists:

s = pd.Series[(1, 2, 3), (4, 5, 6), (1, 2, 3, 4), (8, 9, 10)]

I would like to check whether any list in the series contains all elements of a reference list:

l = [4, 5]

The function would return true based on the 2nd list in the series satisfying the criteria.

Ideas on how to implement this? I have tried the following to no avail:

def contains_valid_data():
        return all(x in s for x in l)

def contains_valid_data():
        return set(l).issubset(s)

CodePudding user response：

s = pd.Series([(1, 2, 3), (4, 5, 6), (1, 2, 3, 4), (8, 9, 10)])
l = [4, 5]

def checkList(s, l):
    return s[s.apply(lambda x: len(set(l).intersection(set(x)))== len(l)]

checkList(s, l)

1    (4, 5, 6)
dtype: object

CodePudding user response：

I'm not sure to understand the expected output but you can try this :

import pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6], [1, 2, 3, 4], [8, 9, 10]])
l = [4, 5]

out = s.apply(lambda x: set(l).issubset(x))

print(out)
0    False
1     True
2    False
3    False
dtype: bool

Or, if you want to return a unique boolean value, you can use this :

def contains_valid_data():
    if (s.apply(lambda x: set(l).issubset(x))==True).any():
        return True

contains_valid_data()
True

CodePudding user response：

You could use pandas.Series.map to check if each row contains all the elements in l (with a lambda expression or a function) and transform this pd.Series into a boolean series first. And then simply use the pandas.Series.any function to aggregate all the boolean values with an OR operation.

# s = pd.Series([(1, 2, 3), (4, 5, 6), (1, 2, 3, 4), (8, 9, 10)])
# l = [4, 5]

l = set(l)  # Transform the list into a `set` for better performance. 
is_some_row_containing_l = s.map(lambda ss: set(ss).issuperset(l)).any()