Home > OS >  How to filter subcategories of rows from one column, based on counts in second column
How to filter subcategories of rows from one column, based on counts in second column

Time:10-11

Sorry it's a bit complicated, but lets say I have a very long table of IDs and Fruits:

ID Fruit
1 Apple
2 Banana
4 Orange
... ...
3 Banana
1 Orange

The ID may be repeated several times in the table and the fruit may also be repeat several times. For example, in the whole dataframe, ID #1 can have 3 instances of "Apple" and 2 instances of "Orange", etc. ID #2 can have 0 instances of "Apple" and 5 instances of "Orange"

My goal is to filter the table such that as long as the ID has one instance of fruit "Apple", then ALL rows with that ID should remain in the table. So let's say that in the table above, ID # 4 and ID #2 do not have a single instance of "Apple" in the entire dataframe. So the ALL rows that contained ID 4 and 2 should be removed from the dataframe

ID Fruit
1 Apple
... ...
3 Banana
1 Orange

Any help is appreciated, thank you!

CodePudding user response:

You can use groupby.transform('any') combined with boolean indexing.

out = df[df['Fruit'].eq('Apple').groupby(df['ID']).transform('any')]

CodePudding user response:

Use:

df[df['ID'].isin(df.loc[df['Fruit'].eq('Apple'), 'ID'])]
  • Related