Home > Software engineering >  How to test whether a pandas series contains elements from another list (or NumPy array or pandas se
How to test whether a pandas series contains elements from another list (or NumPy array or pandas se

Time:06-30

Assume that I have this DataFrame (Animals column is of type pandas.Series):

ID Animals
1 [cat, dog, chicken]
2 [penguin]

And these lists (It can be NumPy Array or Pandas Series if it is better for performance):

mammals = ['cat', 'dog', 'cow', 'sheep']
birds = ['chicken', 'duck', 'penguin']

What I am trying to do is to add two columns to my DataFrame which are ContainsBirds and ContainsMammals based on the contents of the Animals column.

Here is the final expected output:

ID Animals ContainsBirds ContainsMammals
1 [cat, dog, chicken] 1.0 1.0
2 [penguin] 1.0 0.0

CodePudding user response:

You can create dictionary for test if match at least one value by converting to sets with isdisjoint and if necessary 0.0 and 1.0 casting boolean to floats, for 0, 1 use .astype(int):

d = {'Birds':birds, 'Mammals':mammals}

for k, v in d.items():
    df[f'Contains{k}'] = (~df['Animals'].map(set(v).isdisjoint)).astype(float)
print (df)
   ID              Animals  ContainsBirds  ContainsMammals
0   1  [cat, dog, chicken]            1.0              1.0
1   2            [penguin]            1.0              0.0

CodePudding user response:

Using a list comprehension:

lists = [birds, mammals]
names = ['Birds', 'Mammals']

df[names] = [[int(bool(set(l).intersection(x))) for l in lists]
             for x in df['Animals']]

output:

   ID              Animals  Birds  Mammals
0   1  [cat, dog, chicken]      1        1
1   2            [penguin]      1        0
  • Related