The pandas isin() function but returning the actual values, not just a boolean-CodePudding

I have an NumPy array of good animals, and a DataFrame of people with a list of animals they own.

good_animals = np.array(['Owl', 'Dragon', 'Shark', 'Cat', 'Unicorn', 'Penguin'])
data = {
>     'People': [1, 2, 3, 4, 5],
>     'Animals': [['Owl'], ['Owl', 'Dragon'], ['Dog', 'Human'], ['Unicorn', 'Pitbull'], []],
>     }
df = pd.DataFrame(data)

I want to add another column to my DataFrame, showing all the good animals that person owns.

The following gives me a Series showing whether or not each animal is a good animal.

df['Animals'].apply(lambda x: np.isin(x, good_animals))

But I want to see the actual good animals, not just booleans.

CodePudding user response：

You can use intersection of sets from lists:

df['new'] = df['Animals'].apply(lambda x: list(set(x).intersection(good_animals)))
print (df)
   People             Animals            new
0       1               [Owl]          [Owl]
1       2       [Owl, Dragon]  [Dragon, Owl]
2       3        [Dog, Human]             []
3       4  [Unicorn, Pitbull]      [Unicorn]
4       5                  []             []

If possible duplciated values or if order is important use list comprehension:

s = set(good_animals)
df['new'] = df['Animals'].apply(lambda x: [y for y in x if y in s])
print (df)
   People             Animals            new
0       1               [Owl]          [Owl]
1       2       [Owl, Dragon]  [Owl, Dragon]
2       3        [Dog, Human]             []
3       4  [Unicorn, Pitbull]      [Unicorn]
4       5                  []             []

CodePudding user response：

I`m not very sure if I understood well your questions. Why are you using np.array? You can try this:

good_animals = ['Owl', 'Dragon', 'Shark', 'Cat', 'Unicorn', 'Penguin']
import pandas as pd
df_dict = {
    'People':["1","2","3","4","5"],
        
    'Animals':[['Owl'],['Owl', 'Dragon'], ['Dog', 'Human'],  ['Unicorn', 'Pitbull'],[]],
     
    'Good_animals': [None, None, None,None,None]
        }
df = pd.DataFrame(df_dict)

for x in range(df.shape[0]):
    row = x
    df.Good_animals.iloc[x] = ', ' .join([y for y in df.Animals.iloc[row] if y in good_animals])

The result:

    People   Animals             Good_animals
0   1        [Owl]               Owl
1   2        [Owl, Dragon]       Owl, Dragon
2   3        [Dog, Human]   
3   4        [Unicorn, Pitbull]  Unicorn
4   5        []