I have two columns of data 'Part & Qty' where some part numbers are repeated across multiple rows. The quantity value I need to retrieve is contained in the final row before the part number changes.
My code (below) adds a True/False column to flag when the part number changes. I had thought when the flag is 'True' I want to retrieve the data from the previous row, however this does not work for the first and last rows.
Running my code gives the output (left) vs the data I'm trying to extract is(right):
What is the best way to achieve this?
import pandas as pd
import numpy as np
df = pd.DataFrame({
'part_no_2': [22, 22, 22, 23, 23, 24, 25, 25, 25, 26],
'qty': [0, 0, 4, 44, 22, 0, 7, 16, 5, 6]})
df['part_no_change'] = np.where(df["part_no_2"].shift() != df["part_no_2"], True, False) #look for when PNo changes
df
CodePudding user response:
Try shift(-1)
:
df[df.part_no_2 != df.part_no_2.shift(-1)]
part_no_2 qty
2 22 4
4 23 22
5 24 0
8 25 5
9 26 6
CodePudding user response:
Pandas has a built-in
method to do this:
df.drop_duplicates(subset=['part_no_2'], keep='last')