Count occurrences within a specific range-CodePudding

I have a data frame that looks like this:

               Tag
0           skip_1
1              run
2           skip_1
3              run
4           skip_1
5              run
6           skip_2
7              run
8           skip_1
9              run
10          skip_2
11            jump
12          skip_1
13             run
14          skip_2
15            jump
16          skip_1
17             run
18          skip_2
19    cleanup_jump
20          skip_1
21             run
22          skip_2
23             run
24          skip_2
25            jump
26          skip_1
27             run
28          skip_2
29            jump

First, I would like to count the RUN occurrences between two JUMP events, then to enumerate this occurrences from the latest to the earliest within this range. The expected results would be:

             Tag  Jump_Run_Count  Run_Order
0         skip_1               0          0
1            run               0          5
2         skip_1               0          0
3            run               0          4
4         skip_1               0          0
5            run               0          3
6         skip_2               0          0
7            run               0          2
8         skip_1               0          0
9            run               0          1
10        skip_2               0          0
11          jump               5          0
12        skip_1               0          0
13           run               0          1
14        skip_2               0          0
15          jump               1          0
16        skip_1               0          0
17           run               0          0
18        skip_2               0          0
19  cleanup_jump               0          0
20        skip_1               0          0
21           run               0          2
22        skip_2               0          0
23           run               0          1
24        skip_2               0          0
25          jump               2          0
26        skip_1               0          0
27           run               0          1
28        skip_2               0          0
29          jump               1          0

One of the problems here is that the first RUN occurrences are not within 2 JUMP but are between the first JUMP and the beginning of the column.

Secondly I would like to do the same count and enumerate for a CLEANUP_JUMP and JUMP range, and store it in separate columns.

             Tag  Jump_Run_Count  Run_Order  Cleanup_Jump_Dig_Count  Run_Order2
0         skip_1               0          0                       0           0
1            run               0          5                       0           0
2         skip_1               0          0                       0           0
3            run               0          4                       0           0
4         skip_1               0          0                       0           0
5            run               0          3                       0           0
6         skip_2               0          0                       0           0
7            run               0          2                       0           0
8         skip_1               0          0                       0           0
9            run               0          1                       0           0
10        skip_2               0          0                       0           0
11          jump               5          0                       0           0
12        skip_1               0          0                       0           0
13           run               0          1                       0           0
14        skip_2               0          0                       0           0
15          jump               1          0                       0           0
16        skip_1               0          0                       0           0
17           run               0          0                       0           1
18        skip_2               0          0                       0           0
19  cleanup_jump               0          0                       1           0
20        skip_1               0          0                       0           0
21           run               0          2                       0           0
22        skip_2               0          0                       0           0
23           run               0          1                       0           0
24        skip_2               0          0                       0           0
25          jump               2          0                       0           0
26        skip_1               0          0                       0           0
27           run               0          1                       0           0
28        skip_2               0          0                       0           0
29          jump               1          0                       0           0

I have added some pictures that might explain it better:

Scenario1

Scenario2

Any help on how to code this, or even another way to approach this issue will be highly appreciated.

Thanks!

CodePudding user response：

Here is a solution using pandas:

import pandas as pd
import numpy as np

df['run'] = df['Tag'] == 'run'
val_mask = df['Tag'].replace({'cleanup_jump':'jump'}) == 'jump'
df['tag_id'] = (val_mask).cumsum()
df.loc[val_mask, 'Jump_Count'] = df.groupby('tag_id')['run'].sum().to_numpy()[:-1]
df.loc[df['run'], 'run_per_jump'] = df.loc[df['run']].groupby('tag_id')['run'].cumsum()
df['Jump_Run_Order'] = df.groupby('tag_id')['run_per_jump'].rank(method='dense', ascending=False)

jumps_idx = np.flatnonzero(df['Tag'] == 'jump')
cj_idxs = np.flatnonzero(df['Tag'] == 'cleanup_jump')
cj_help_idxs = np.asarray([np.max(jumps_idx[jumps_idx < cj_idx]) for cj_idx in cj_idxs])

for start, end in zip(cj_help_idxs 1, cj_idxs):
    df.loc[start:end, 'Cleanup_Jump_Count'] = df.loc[start:end, 'Jump_Count']
    df.loc[start:end, 'Cleanup_Jump_Run_Order'] = df.loc[start:end, 'Jump_Run_Order']
    df.loc[start:end, 'Jump_Run_Order'] = 0
    df.loc[start:end, 'Jump_Count'] = 0

df = df.drop(columns=['tag_id', 'run', 'run_per_jump']).fillna(0).convert_dtypes(convert_integer=True)

print(df)

             Tag  Jump_Count  Jump_Run_Order  Cleanup_Jump_Run_Order  Cleanup_Jump_Count
0         skip_1           0               0                       0                   0
1            run           0               5                       0                   0
2         skip_1           0               0                       0                   0
3            run           0               4                       0                   0
4         skip_1           0               0                       0                   0
5            run           0               3                       0                   0
6         skip_2           0               0                       0                   0
7            run           0               2                       0                   0
8         skip_1           0               0                       0                   0
9            run           0               1                       0                   0
10        skip_2           0               0                       0                   0
11          jump           5               0                       0                   0
12        skip_1           0               0                       0                   0
13           run           0               1                       0                   0
14        skip_2           0               0                       0                   0
15          jump           1               0                       0                   0
16        skip_1           0               0                       0                   0
17           run           0               0                       1                   0
18        skip_2           0               0                       0                   0
19  cleanup_jump           0               0                       0                   1
20        skip_1           0               0                       0                   0
21           run           0               2                       0                   0
22        skip_2           0               0                       0                   0
23           run           0               1                       0                   0
24        skip_2           0               0                       0                   0
25          jump           2               0                       0                   0
26        skip_1           0               0                       0                   0
27           run           0               1                       0                   0
28        skip_2           0               0                       0                   0
29          jump           1               0                       0                   0