Home > front end >  Pandas extract previous row on value change
Pandas extract previous row on value change

Time:11-07

I have two columns of data 'Part & Qty' where some part numbers are repeated across multiple rows. The quantity value I need to retrieve is contained in the final row before the part number changes.

My code (below) adds a True/False column to flag when the part number changes. I had thought when the flag is 'True' I want to retrieve the data from the previous row, however this does not work for the first and last rows.

Running my code gives the output (left) vs the data I'm trying to extract is(right):

output enter image description here

What is the best way to achieve this?

    import pandas as pd
    import numpy as np
    df = pd.DataFrame({
        'part_no_2': [22, 22, 22, 23, 23, 24, 25, 25, 25, 26],
        'qty': [0, 0, 4, 44, 22, 0, 7, 16, 5, 6]})

    df['part_no_change'] = np.where(df["part_no_2"].shift() != df["part_no_2"], True, False) #look for when PNo changes

    df

CodePudding user response:

Try shift(-1):

df[df.part_no_2 != df.part_no_2.shift(-1)]

   part_no_2  qty
2         22    4
4         23   22
5         24    0
8         25    5
9         26    6

CodePudding user response:

Pandas has a built-in method to do this:

df.drop_duplicates(subset=['part_no_2'], keep='last')
  • Related