Home > database >  Python Pandas : find n in a sorted multindex dataframe and return [0:n 1] for each level 1 index
Python Pandas : find n in a sorted multindex dataframe and return [0:n 1] for each level 1 index

Time:11-09

I am having some issues with what I thought a simple filtering task. We have a data that have roughly this shape :

name item cumsum
name1 item 1 0.05
item 2 0.10
item 3 0.31
name2 item 1 0.02
item 2 0.07
name3 item 1 0.01
item 2 0.07
item 3 0.21
name4 item 1 0.03
item 2 0.12
item 3 0.21
item 4 0.35

What I would like to is to return the dataframe with items smaller than 0.2 and the item directly above. This is table I would like as an output:

name item cumsum
name1 item 1 0.05
item 2 0.10
item 3 0.31
name2 item 1 0.02
item 2 0.07
name3 item 1 0.01
item 2 0.07
item 3 0.21
name4 item 1 0.03
item 2 0.12
item 3 0.21

I tried for each 'name' to find the 'item' that have a cumsum greater than 0.2 and then return the whole range with the indexes as :

    df = df.loc['name1']
    idx = df.loc[df['cumsum'] > 0.2].index[0]
    iidx = df.index.get_loc(idx)   1
    df = df.iloc[:iidx]

and do this for each 'name'. However this fails for name2.

Can anybody help with this please ?

CodePudding user response:

Use | for bitwise OR by mask shifted per groups by DataFrameGroupBy.shift:

m = (df['cumsum'] < 0.2)
df = df[m | m.groupby(level=0).shift(fill_value=False)]
print (df)
              cumsum
name  item          
name1 item 1    0.05
      item 2    0.10
      item 3    0.31
name2 item 1    0.02
      item 2    0.07
name3 item 1    0.01
      item 2    0.07
      item 3    0.21
name4 item 1    0.03
      item 2    0.12
      item 3    0.21
  • Related