I have the following multi-index df:
pageid
sid vid
1 ABC dog
ABC cat
ABC fish
2 DEF pig
DEF cat
DEF fish
3 GHI pig
GHI cat
GHI dog
I want to keep sub-indexes that ONLY contain some value—in this example Ill use dog.
Before:
pageid
sid vid
1 ABC dog
ABC cat
ABC fish
2 DEF pig
DEF cat
DEF fish
3 GHI pig
GHI cat
GHI dog
After (only sid
that contain the pageid
with value dog)
pageid
sid vid
1 ABC dog
ABC cat
ABC fish
3
GHI pig
GHI cat
GHI dog
CodePudding user response:
Here:
df.groupby('sid').filter(lambda x: x['pageid'].eq('dog').sum() > 0)
What that does it it:
- Gets a list of sub-DFs, by their ID
- Takes only those groups where the count of
pageid
in them is greater than0
Or as @d.b suggested:
df.loc[df['pageid'].eq('dog').groupby('sid').transform(any)]
What that does:
- Creates a mask where each row is
True
ifpageId
isdog
,False
otherwise - Groups the masked rows by
sid
- Converts each group of masked rows to one masked row
- Takes groups of rows from the dataframe using that mask