Return rows in a multi-index df where it contains a specific value-CodePudding

I have the following multi-index df:

         pageid
sid vid
 1  ABC     dog
    ABC     cat
    ABC     fish
 2  DEF     pig
    DEF     cat
    DEF     fish
 3  GHI     pig
    GHI     cat
    GHI     dog

I want to keep sub-indexes that ONLY contain some value—in this example Ill use dog.

Before:

        pageid
sid vid
 1  ABC     dog
    ABC     cat
    ABC     fish
 2  DEF     pig
    DEF     cat
    DEF     fish
 3  GHI     pig
    GHI     cat
    GHI     dog

After (only sid that contain the pageid with value dog)

        pageid
sid vid
 1  ABC     dog
    ABC     cat
    ABC     fish
 3
    GHI     pig
    GHI     cat
    GHI     dog

CodePudding user response：

Here:

df.groupby('sid').filter(lambda x: x['pageid'].eq('dog').sum() > 0)

What that does it it:

Gets a list of sub-DFs, by their ID
Takes only those groups where the count of pageid in them is greater than 0

Or as @d.b suggested:

df.loc[df['pageid'].eq('dog').groupby('sid').transform(any)]

What that does:

Creates a mask where each row is True if pageId is dog, False otherwise
Groups the masked rows by sid
Converts each group of masked rows to one masked row
Takes groups of rows from the dataframe using that mask