Home > database >  Return rows in a multi-index df where it contains a specific value
Return rows in a multi-index df where it contains a specific value

Time:11-11

I have the following multi-index df:

         pageid
sid vid
 1  ABC     dog
    ABC     cat
    ABC     fish
 2  DEF     pig
    DEF     cat
    DEF     fish
 3  GHI     pig
    GHI     cat
    GHI     dog

I want to keep sub-indexes that ONLY contain some value—in this example Ill use dog.

Before:

        pageid
sid vid
 1  ABC     dog
    ABC     cat
    ABC     fish
 2  DEF     pig
    DEF     cat
    DEF     fish
 3  GHI     pig
    GHI     cat
    GHI     dog

After (only sid that contain the pageid with value dog)

        pageid
sid vid
 1  ABC     dog
    ABC     cat
    ABC     fish
 3
    GHI     pig
    GHI     cat
    GHI     dog

CodePudding user response:

Here:

df.groupby('sid').filter(lambda x: x['pageid'].eq('dog').sum() > 0)

What that does it it:

  1. Gets a list of sub-DFs, by their ID
  2. Takes only those groups where the count of pageid in them is greater than 0

Or as @d.b suggested:

df.loc[df['pageid'].eq('dog').groupby('sid').transform(any)]

What that does:

  1. Creates a mask where each row is True if pageId is dog, False otherwise
  2. Groups the masked rows by sid
  3. Converts each group of masked rows to one masked row
  4. Takes groups of rows from the dataframe using that mask
  • Related