I would like to return the size of each run above a target value in a dataframe, so far I've only managed to return a running count of the size of the group using
df.groupby((df["value"] < df["target"]).cumsum()).cumcount()
value | target | current code result | desired result |
---|---|---|---|
5 | 5 | 0 | 0 |
6 | 5 | 1 | 3 |
7 | 5 | 2 | 3 |
6 | 5 | 3 | 3 |
5 | 5 | 0 | 0 |
CodePudding user response:
Let's make this a bit more interesting by having two runs in the df:
df = pd.DataFrame(columns = ['value','target'], data = [[5, 5],
[6, 5],
[7, 5],
[6, 5],
[5, 5],
[6, 5],
[7, 5],
[6, 5],
[5, 5],
])
Then we groupby on a combination of two conditions that mark the start and end of a run, and calculate the length in the run assigning it to runcount
:
(df.groupby(
[(df['value'] <= df['target']).cumsum(),
df['value'] > df['target']
])
.apply(lambda g: g.assign(runcount = (g['value']>g['target']).sum()))
.reset_index(drop = True)
)
output
value target runcount
-- ------- -------- ----------
0 5 5 0
1 6 5 3
2 7 5 3
3 6 5 3
4 5 5 0
5 6 5 3
6 7 5 3
7 6 5 3
8 5 5 0
CodePudding user response:
You can create a boolean column 'foo'
that evaluates True if value > target
False otherwise. Then groupby
"foo" and call the sum
function on the "foo" column in each group (note that True=1 and False=0):
df['result'] = df.assign(foo=df['value'].gt(df['target'])).groupby('foo')['foo'].transform('sum')
Output:
value target current code result desired result result
0 5 5 0 0 0
1 6 5 1 3 3
2 7 5 2 3 3
3 6 5 3 3 3
4 5 5 4 0 0