Assume that I have this DataFrame
(Animals
column is of type pandas.Series
):
ID | Animals |
---|---|
1 | [cat, dog, chicken] |
2 | [penguin] |
And these list
s (It can be NumPy Array
or Pandas Series
if it is better for performance):
mammals = ['cat', 'dog', 'cow', 'sheep']
birds = ['chicken', 'duck', 'penguin']
What I am trying to do is to add two columns to my DataFrame
which are ContainsBirds
and ContainsMammals
based on the contents of the Animals
column.
Here is the final expected output:
ID | Animals | ContainsBirds | ContainsMammals |
---|---|---|---|
1 | [cat, dog, chicken] | 1.0 | 1.0 |
2 | [penguin] | 1.0 | 0.0 |
CodePudding user response:
You can create dictionary for test if match at least one value by converting to sets with isdisjoint
and if necessary 0.0
and 1.0
casting boolean to floats
, for 0, 1
use .astype(int)
:
d = {'Birds':birds, 'Mammals':mammals}
for k, v in d.items():
df[f'Contains{k}'] = (~df['Animals'].map(set(v).isdisjoint)).astype(float)
print (df)
ID Animals ContainsBirds ContainsMammals
0 1 [cat, dog, chicken] 1.0 1.0
1 2 [penguin] 1.0 0.0
CodePudding user response:
Using a list comprehension:
lists = [birds, mammals]
names = ['Birds', 'Mammals']
df[names] = [[int(bool(set(l).intersection(x))) for l in lists]
for x in df['Animals']]
output:
ID Animals Birds Mammals
0 1 [cat, dog, chicken] 1 1
1 2 [penguin] 1 0