Home > Enterprise >  Pandas - filter rows with same value in one column and multiple values in another column based on th
Pandas - filter rows with same value in one column and multiple values in another column based on th

Time:04-12

I have the following base table that I would like to separate out into a has guava table and a does not have guava table. I'm thinking of using a flag to get the intermediate table below but not sure where to go from there.

base table

user_id fruit  
user1   passionfruit  
user1   guava
user1   banana
user2   orange
user2   coconut
user3   guava
user4   melon

has guava

user_id fruit  
user1   passionfruit  
user1   guava
user1   banana
user3   guava

does not have guava

user_id fruit  
user2   orange
user2   coconut
user4   melon

intermediate table

user_id fruit        has_guava
user1   passionfruit 0 
user1   guava        1
user1   banana       0
user2   orange       0
user2   coconut      0
user3   guava        1
user4   melon        0

CodePudding user response:

Try groupby then filter.

df_ = (df.
       groupby('user_id').
       filter(lambda group: group['fruit'].eq('guava').any())
)
print(df_)

  user_id         fruit
0   user1  passionfruit
1   user1         guava
2   user1        banana
5   user3         guava

CodePudding user response:

Without groupby check isin

out = df[df.user_id.isin(df.loc[df.fruit.isin(['guava']),'user_id'])]
Out[322]: 
  user_id         fruit
0   user1  passionfruit
1   user1         guava
2   user1        banana
5   user3         guava
  • Related