Python Pandas : find n in a sorted multindex dataframe and return [0:n 1] for each level 1 index-CodePudding

I am having some issues with what I thought a simple filtering task. We have a data that have roughly this shape :

name	item	cumsum
name1	item 1	0.05
	item 2	0.10
	item 3	0.31
name2	item 1	0.02
	item 2	0.07
name3	item 1	0.01
	item 2	0.07
	item 3	0.21
name4	item 1	0.03
	item 2	0.12
	item 3	0.21
	item 4	0.35

What I would like to is to return the dataframe with items smaller than 0.2 and the item directly above. This is table I would like as an output:

name	item	cumsum
name1	item 1	0.05
	item 2	0.10
	item 3	0.31
name2	item 1	0.02
	item 2	0.07
name3	item 1	0.01
	item 2	0.07
	item 3	0.21
name4	item 1	0.03
	item 2	0.12
	item 3	0.21

I tried for each 'name' to find the 'item' that have a cumsum greater than 0.2 and then return the whole range with the indexes as :

    df = df.loc['name1']
    idx = df.loc[df['cumsum'] > 0.2].index[0]
    iidx = df.index.get_loc(idx)   1
    df = df.iloc[:iidx]

and do this for each 'name'. However this fails for name2.

Can anybody help with this please ?

CodePudding user response：

Use | for bitwise OR by mask shifted per groups by DataFrameGroupBy.shift:

m = (df['cumsum'] < 0.2)
df = df[m | m.groupby(level=0).shift(fill_value=False)]
print (df)
              cumsum
name  item          
name1 item 1    0.05
      item 2    0.10
      item 3    0.31
name2 item 1    0.02
      item 2    0.07
name3 item 1    0.01
      item 2    0.07
      item 3    0.21
name4 item 1    0.03
      item 2    0.12
      item 3    0.21