Home > Back-end >  Keep only the last record if the value occurs continuously
Keep only the last record if the value occurs continuously

Time:10-22

Keep only the last record if the values occurs continuously.

Input_df:

Date Value
2022/01/01 5
2022/01/03 4
2022/01/05 3
2022/01/06 3
2022/01/07 3
2022/01/08 4
2022/01/09 3

Output_df:

Date Value
2022/01/01 5
2022/01/03 4
2022/01/07 3
2022/01/08 4
2022/01/09 3

-- The value 3 repeats continuously for 3 dates, so we only keep the latest record out of the three continuous dates and if there is a different value transmitted in between the continuity breaks, so do not delete the record.

CodePudding user response:

You can use pandas.Series.diff to create a flag and see is the column value is continous or not. See the documentation here.

Then drop line that are continous.

# Create the dataframe
df = pd.DataFrame({
    "Date" : ["2022/01/01", "2022/01/03", "2022/01/05", "2022/01/06", "2022/01/07", "2022/01/08", "2022/01/09"], 
    "Value" : [5, 4, 3, 3, 3, 4, 3]
})

# Create a flag 
df['Diff'] = df['Value'].diff(periods = -1).fillna(1)
df = df.loc[df['Diff'] != 0, :].drop('Diff', axis = 1)

CodePudding user response:

Try:

Input_df.drop_duplicates(keep='last)
  • Related