Keep only the last record if the value occurs continuously-CodePudding

Keep only the last record if the values occurs continuously.

Input_df:

Date	Value
2022/01/01	5
2022/01/03	4
2022/01/05	3
2022/01/06	3
2022/01/07	3
2022/01/08	4
2022/01/09	3

Output_df:

Date	Value
2022/01/01	5
2022/01/03	4
2022/01/07	3
2022/01/08	4
2022/01/09	3

-- The value 3 repeats continuously for 3 dates, so we only keep the latest record out of the three continuous dates and if there is a different value transmitted in between the continuity breaks, so do not delete the record.

CodePudding user response：

You can use pandas.Series.diff to create a flag and see is the column value is continous or not. See the documentation here.

Then drop line that are continous.

# Create the dataframe
df = pd.DataFrame({
    "Date" : ["2022/01/01", "2022/01/03", "2022/01/05", "2022/01/06", "2022/01/07", "2022/01/08", "2022/01/09"], 
    "Value" : [5, 4, 3, 3, 3, 4, 3]
})

# Create a flag 
df['Diff'] = df['Value'].diff(periods = -1).fillna(1)
df = df.loc[df['Diff'] != 0, :].drop('Diff', axis = 1)

CodePudding user response：

Try:

Input_df.drop_duplicates(keep='last)