how to retrieve rows that only follow a specific number sequence ? Python Pandas-CodePudding

I would like to recover only the sequence 15 followed by 25 from this dataframe as shown in the following example:

Step
25
15 <--
25 <--
15 <--
25 <--
25
25
25
15
15
15 <--
25 <--
15

the problem is that 25 or 15 can be repeated several times in a row irregularly (example : 15 - 15 - 15 - 25). so I don't know how to treat it. Overall I would like to get the first 15 before the 25th

The result must be :

Step

15 <--
25 <--
15 <--
25 <--

15 <--
25 <--

CodePudding user response：

Create virtual groups to groupby and keep the 2 first values (15, 25) of each group if the group has a length, at least, greater than 1.

>>> df.groupby(df['Step'].eq(15).cumsum()) \
      .apply(lambda x: x[:2] if len(x)>1 else None) \
      .droplevel(0).rename_axis(None)

    Step
1     15
2     25
3     15
4     25
10    15
11    25

Details:

>>> pd.concat([df['Step'], df['Step'].eq(15).cumsum()], axis=1)
    Step  Step
0     25     0    # drop, only one item in the group
1     15     1  # keep, first item of a group where length > 1
2     25     1  # keep, second item of a group where length > 1
3     15     2  # keep, first item of a group where length > 1
4     25     2  # keep, second item of a group where length > 1
5     25     2    # drop, third item of a group where length > 1
6     25     2    # drop, fourth item of a group where length > 1
7     25     2    # drop, fifth item of a group where length > 1
8     15     3    # drop, only one item in the group
9     15     4    # drop, only one item in the group
10    15     5  # keep, first item of a group where length > 1
11    25     5  # keep, second item of a group where length > 1
12    15     6    # drop, only one item in the group