I would like to recover only the sequence 15 followed by 25 from this dataframe as shown in the following example:
Step
25
15 <--
25 <--
15 <--
25 <--
25
25
25
15
15
15 <--
25 <--
15
the problem is that 25 or 15 can be repeated several times in a row irregularly (example : 15 - 15 - 15 - 25). so I don't know how to treat it. Overall I would like to get the first 15 before the 25th
The result must be :
Step
15 <--
25 <--
15 <--
25 <--
15 <--
25 <--
CodePudding user response:
Create virtual groups to groupby
and keep the 2 first values (15, 25) of each group if the group has a length, at least, greater than 1.
>>> df.groupby(df['Step'].eq(15).cumsum()) \
.apply(lambda x: x[:2] if len(x)>1 else None) \
.droplevel(0).rename_axis(None)
Step
1 15
2 25
3 15
4 25
10 15
11 25
Details:
>>> pd.concat([df['Step'], df['Step'].eq(15).cumsum()], axis=1)
Step Step
0 25 0 # drop, only one item in the group
1 15 1 # keep, first item of a group where length > 1
2 25 1 # keep, second item of a group where length > 1
3 15 2 # keep, first item of a group where length > 1
4 25 2 # keep, second item of a group where length > 1
5 25 2 # drop, third item of a group where length > 1
6 25 2 # drop, fourth item of a group where length > 1
7 25 2 # drop, fifth item of a group where length > 1
8 15 3 # drop, only one item in the group
9 15 4 # drop, only one item in the group
10 15 5 # keep, first item of a group where length > 1
11 25 5 # keep, second item of a group where length > 1
12 15 6 # drop, only one item in the group