return size of the group in each row of a pandas dataframe-CodePudding

I would like to return the size of each run above a target value in a dataframe, so far I've only managed to return a running count of the size of the group using

df.groupby((df["value"] < df["target"]).cumsum()).cumcount()

value	target	current code result	desired result
5	5	0	0
6	5	1	3
7	5	2	3
6	5	3	3
5	5	0	0

CodePudding user response：

Let's make this a bit more interesting by having two runs in the df:

df = pd.DataFrame(columns = ['value','target'], data = [[5, 5], 
[6, 5], 
[7, 5], 
[6, 5], 
[5, 5],
[6, 5], 
[7, 5], 
[6, 5], 
[5, 5],
])

Then we groupby on a combination of two conditions that mark the start and end of a run, and calculate the length in the run assigning it to runcount:

(df.groupby(
    [(df['value'] <= df['target']).cumsum(), 
      df['value'] > df['target']
      ])
   .apply(lambda g: g.assign(runcount = (g['value']>g['target']).sum()))
   .reset_index(drop = True)
)

output

      value    target    runcount
--  -------  --------  ----------
 0        5         5           0
 1        6         5           3
 2        7         5           3
 3        6         5           3
 4        5         5           0
 5        6         5           3
 6        7         5           3
 7        6         5           3
 8        5         5           0

CodePudding user response：

You can create a boolean column 'foo' that evaluates True if value > target False otherwise. Then groupby "foo" and call the sum function on the "foo" column in each group (note that True=1 and False=0):

df['result'] = df.assign(foo=df['value'].gt(df['target'])).groupby('foo')['foo'].transform('sum')

Output:

   value  target  current code result  desired result  result
0      5       5                    0               0       0
1      6       5                    1               3       3
2      7       5                    2               3       3
3      6       5                    3               3       3
4      5       5                    4               0       0