Elements of one column that don't have a certain value in another column-CodePudding

I have a DataFrame, let's assume the following one:

df=pd.DataFrame({'person':['Sebastian','Sebastian','Sebastian', 'Maria', 'Maria', 'Maria', 'Achim','Achim','Achim'],'item':['house','garden','sink','sink','gold','house','stone','gold','wood']})

Now I want to get a list of all persons who don't own a certain item, for example gold. I've fould a way to implement it, but I think there is a better way. That is how I have done it:

allPersons=df['person'].unique()
personWithGold=df[df['item']=='gold']['person'].unique()
personWithoutGold=allPersons[~np.isin(allPersons,personWithGold)]

Any suggestions how to improve the code? I somehow feel that there is a pretty easy one-line solution.

CodePudding user response：

With grouping by persons and further filtering:

no_gold = df.groupby('person').apply(lambda x: (x['item'].ne('gold')).all())
persons_no_gold = no_gold[no_gold].index.values

array(['Sebastian'], dtype=object)

CodePudding user response：

Using `pd.get_dummies` and indexing

You can also use pd.get_dummies and some filtering to get what you need instead of using pd.DataFrame.groupby

import pandas as pd

forbidden_item = 'gold'
pivot = pd.get_dummies(df.set_index('person')['item']).max(level=0)
result = pivot[pivot[forbidden_item]==0].index.tolist()
result

['Sebastian']

The pivot table looks like this, so you can use this to work with more complex logic.

print(pivot)

           garden  gold  house  sink  stone  wood
person                                           
Sebastian       1     0      1     1      0     0
Maria           0     1      1     1      0     0
Achim           0     1      0     0      1     1

CodePudding user response：

You can use boolean-indexing .drop_duplicates() here:

who_has_gold = df.loc[df.item.eq('gold'), 'person'].drop_duplicates()

persons_without_gold = df.loc[~df['person'].isin(who_has_gold), 'person'].drop_duplicates()
print(persons_without_gold)

Prints:

0    Sebastian
Name: person, dtype: object

Using pd.get_dummies and indexing

Using `pd.get_dummies` and indexing