I have a DataFrame, let's assume the following one:
df=pd.DataFrame({'person':['Sebastian','Sebastian','Sebastian', 'Maria', 'Maria', 'Maria', 'Achim','Achim','Achim'],'item':['house','garden','sink','sink','gold','house','stone','gold','wood']})
Now I want to get a list of all persons who don't own a certain item, for example gold. I've fould a way to implement it, but I think there is a better way. That is how I have done it:
allPersons=df['person'].unique()
personWithGold=df[df['item']=='gold']['person'].unique()
personWithoutGold=allPersons[~np.isin(allPersons,personWithGold)]
Any suggestions how to improve the code? I somehow feel that there is a pretty easy one-line solution.
CodePudding user response:
With grouping by persons and further filtering:
no_gold = df.groupby('person').apply(lambda x: (x['item'].ne('gold')).all())
persons_no_gold = no_gold[no_gold].index.values
array(['Sebastian'], dtype=object)
CodePudding user response:
Using pd.get_dummies
and indexing
You can also use pd.get_dummies
and some filtering to get what you need instead of using pd.DataFrame.groupby
import pandas as pd
forbidden_item = 'gold'
pivot = pd.get_dummies(df.set_index('person')['item']).max(level=0)
result = pivot[pivot[forbidden_item]==0].index.tolist()
result
['Sebastian']
The pivot table looks like this, so you can use this to work with more complex logic.
print(pivot)
garden gold house sink stone wood
person
Sebastian 1 0 1 1 0 0
Maria 0 1 1 1 0 0
Achim 0 1 0 0 1 1
CodePudding user response:
You can use boolean-indexing .drop_duplicates()
here:
who_has_gold = df.loc[df.item.eq('gold'), 'person'].drop_duplicates()
persons_without_gold = df.loc[~df['person'].isin(who_has_gold), 'person'].drop_duplicates()
print(persons_without_gold)
Prints:
0 Sebastian
Name: person, dtype: object