I have an NumPy array of good animals, and a DataFrame of people with a list of animals they own.
good_animals = np.array(['Owl', 'Dragon', 'Shark', 'Cat', 'Unicorn', 'Penguin'])
data = {
> 'People': [1, 2, 3, 4, 5],
> 'Animals': [['Owl'], ['Owl', 'Dragon'], ['Dog', 'Human'], ['Unicorn', 'Pitbull'], []],
> }
df = pd.DataFrame(data)
I want to add another column to my DataFrame, showing all the good animals that person owns.
The following gives me a Series showing whether or not each animal is a good animal.
df['Animals'].apply(lambda x: np.isin(x, good_animals))
But I want to see the actual good animals, not just booleans.
CodePudding user response:
You can use intersection
of sets from lists:
df['new'] = df['Animals'].apply(lambda x: list(set(x).intersection(good_animals)))
print (df)
People Animals new
0 1 [Owl] [Owl]
1 2 [Owl, Dragon] [Dragon, Owl]
2 3 [Dog, Human] []
3 4 [Unicorn, Pitbull] [Unicorn]
4 5 [] []
If possible duplciated values or if order is important use list comprehension:
s = set(good_animals)
df['new'] = df['Animals'].apply(lambda x: [y for y in x if y in s])
print (df)
People Animals new
0 1 [Owl] [Owl]
1 2 [Owl, Dragon] [Owl, Dragon]
2 3 [Dog, Human] []
3 4 [Unicorn, Pitbull] [Unicorn]
4 5 [] []
CodePudding user response:
I`m not very sure if I understood well your questions. Why are you using np.array? You can try this:
good_animals = ['Owl', 'Dragon', 'Shark', 'Cat', 'Unicorn', 'Penguin']
import pandas as pd
df_dict = {
'People':["1","2","3","4","5"],
'Animals':[['Owl'],['Owl', 'Dragon'], ['Dog', 'Human'], ['Unicorn', 'Pitbull'],[]],
'Good_animals': [None, None, None,None,None]
}
df = pd.DataFrame(df_dict)
for x in range(df.shape[0]):
row = x
df.Good_animals.iloc[x] = ', ' .join([y for y in df.Animals.iloc[row] if y in good_animals])
The result:
People Animals Good_animals
0 1 [Owl] Owl
1 2 [Owl, Dragon] Owl, Dragon
2 3 [Dog, Human]
3 4 [Unicorn, Pitbull] Unicorn
4 5 []