I have a series of lists:
s = pd.Series[(1, 2, 3), (4, 5, 6), (1, 2, 3, 4), (8, 9, 10)]
I would like to check whether any list in the series contains all elements of a reference list:
l = [4, 5]
The function would return true based on the 2nd list in the series satisfying the criteria.
Ideas on how to implement this? I have tried the following to no avail:
def contains_valid_data():
return all(x in s for x in l)
def contains_valid_data():
return set(l).issubset(s)
CodePudding user response:
s = pd.Series([(1, 2, 3), (4, 5, 6), (1, 2, 3, 4), (8, 9, 10)])
l = [4, 5]
def checkList(s, l):
return s[s.apply(lambda x: len(set(l).intersection(set(x)))== len(l)]
checkList(s, l)
1 (4, 5, 6)
dtype: object
CodePudding user response:
I'm not sure to understand the expected output but you can try this :
import pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6], [1, 2, 3, 4], [8, 9, 10]])
l = [4, 5]
out = s.apply(lambda x: set(l).issubset(x))
print(out)
0 False
1 True
2 False
3 False
dtype: bool
Or, if you want to return a unique boolean value, you can use this :
def contains_valid_data():
if (s.apply(lambda x: set(l).issubset(x))==True).any():
return True
contains_valid_data()
True
CodePudding user response:
You could use pandas.Series.map
to check if each row contains all the elements in l
(with a lambda expression or a function) and transform this pd.Series
into a boolean series first. And then simply use the pandas.Series.any
function to aggregate all the boolean values with an OR
operation.
# s = pd.Series([(1, 2, 3), (4, 5, 6), (1, 2, 3, 4), (8, 9, 10)])
# l = [4, 5]
l = set(l) # Transform the list into a `set` for better performance.
is_some_row_containing_l = s.map(lambda ss: set(ss).issuperset(l)).any()