I have a Pandas Series of lists of arbitary length:
s = pd.Series([[1,2,3], [4,6], [7,8,9,10]])
and a list of elements
l = [1,2,3,6,7,8]
I want to return all elements of the series s
which has all values contained in l
, otherwise None
. I want to do something like this but apply it to each element in the series:
s.where(s.isin(l), None)
So the output would be a series:
pd.Series([[1,2,3], None, None])
CodePudding user response:
You can use the magic of python sets:
s.apply(set(l).issuperset)
Output:
0 True
1 False
2 False
dtype: bool
Then use where
to modify the non matching rows using the previous output as mask:
s.where(s.apply(set(l).issuperset), None)
Output:
0 [1, 2, 3]
1 None
2 None
dtype: object
CodePudding user response:
you can explode
the series, use isin
with l and use all
with the parameter level=0 (equivalent to groupby.all
on the index).
print(s.explode().isin(l).all(level=0))
0 True
1 False
2 False
dtype: bool
use this Boolean mask in where
to get your expected result
s1 = s.where(s.explode().isin(l).all(level=0), None)
print(s1)
0 [1, 2, 3]
1 None
2 None
dtype: object
Thanks to a comment of @mozway, the parameter level=0 in all is being deprecated, so the solution would be with groupby.all
s1 = s.where(s.explode().isin(l).groupby(level=0).all(), None)
CodePudding user response:
@TomNash, you can combine all
function with listcomprehension
:
s = pd.Series([[1,2,3], [4,5,6], [7,8,9]])
l = [1,2,3,6,7,8]
final_list = []
for x in s:
if all(item in l for item in x):
final_list.append(x)
else:
final_list.append(None)
print(final_list)
OUTPUT:
[[1, 2, 3], None, None]
CodePudding user response:
s = pd.Series([[1,2,3], [4,6], [7,8,9,10]])
l = [1,2,3,6,7,8]
new_series = []
for i in range(len(s)):
s_in_l = 0
for j in range(len(s[i])):
if s[i][j] not in l:
s_in_l = s_in_l 1
if s_in_l == 0:
new_series.append(s[i])
else:
new_series.append(None)
new_series = pd.Series(new_series)
print(new_series)
output:
0 [1, 2, 3]
1 None
2 None
dtype: object
CodePudding user response:
You can check the element of s
is subset of l
by .issubset
function, as folllows:
s.apply(lambda x: x if set(x).issubset(l) else None)
or make use of numpy function setdiff1d
, as follows:
s.apply(lambda x: x if (len(np.setdiff1d(x, l)) == 0) else None)
Result:
0 [1, 2, 3]
1 None
2 None
dtype: object