Python Pandas DataFrame - pull last data from reoccurring cycling index (groupby, sortby)-CodePudding

Forgive me as I'm new to python, but I couldn't find any solution to this. I have a DF with a reoccurring Step Index and want to pull last data from Step 14, however this Index repeats. If Step == 14 & the next Step == 15, pull X, Y, Z from (last) Step 14.

Step      X       Y       Z
12   3  7.001   4.015   2.917
12   3  7.001   4.100   3.966
13   3  4.633   4.100   4.439
13   3  0.984   4.100   5.491
13   3  0.898   4.100   5.538
14   3  -7.000  3.913   5.538
14   3  -7.000  3.858   5.538
14   3  -7.000  3.547   5.538
14   3  -7.000  3.523   5.538
15   4  0.000   3.523   5.538
15   4  0.000   3.523   5.538
12   4  7.001   3.809   0.583
12   4  7.001   4.100   4.029
13   4  4.633   4.100   4.502
13   4  0.980   4.100   5.553
13   4  0.910   4.100   5.592
14   4  -7.000  3.912   5.592
14   4  -7.000  3.546   5.592
14   4  -7.000  3.522   5.592
15   5  0.000   3.522   5.592
15   5  0.000   3.522   5.592

Output should be a new DF like this:

Step    X      Y      Z
14  -7.000  3.523   5.538
14  -7.000  3.522   5.592

I'm guessing I will have to to an for statement with if/and to go through all columns to see if Step == 14 and next Step == 15.

Thank you,

CodePudding user response：

IIUC, you can use

m = df['Step'].eq(14) & df['Step'].eq(15).shift(-1)
out = df[m]

print(out)

    Step    X      Y      Z
8     14 -7.0  3.523  5.538
18    14 -7.0  3.522  5.592

To keep the last line of Step 14

m = df['Step'].eq(14) & df['Step'].ne(14).shift(-1)
out = df[m]

CodePudding user response：

I am a bit confused about the first row in the table that has no title (between 'Step' and 'X'. Is it part of the index?

This one line is sufficient in normal circumstances. Functions like this will typically have the best performance.

new = df.drop_duplicates('Step', keep='last')

If you need to go through each line (to add some other logic to the sorting), you can use this code (with potentially reduced performance):

step = 0
for ix, row in df.iterrows():
    if step == row.Step: #if the same as the previous row
        df = df.drop(ix)