Keep only the last record if the values occurs continuously.
Input_df:
Date | Value |
---|---|
2022/01/01 | 5 |
2022/01/03 | 4 |
2022/01/05 | 3 |
2022/01/06 | 3 |
2022/01/07 | 3 |
2022/01/08 | 4 |
2022/01/09 | 3 |
Output_df:
Date | Value |
---|---|
2022/01/01 | 5 |
2022/01/03 | 4 |
2022/01/07 | 3 |
2022/01/08 | 4 |
2022/01/09 | 3 |
-- The value 3 repeats continuously for 3 dates, so we only keep the latest record out of the three continuous dates and if there is a different value transmitted in between the continuity breaks, so do not delete the record.
CodePudding user response:
You can use pandas.Series.diff
to create a flag and see is the column value is continous or not. See the documentation here.
Then drop line that are continous.
# Create the dataframe
df = pd.DataFrame({
"Date" : ["2022/01/01", "2022/01/03", "2022/01/05", "2022/01/06", "2022/01/07", "2022/01/08", "2022/01/09"],
"Value" : [5, 4, 3, 3, 3, 4, 3]
})
# Create a flag
df['Diff'] = df['Value'].diff(periods = -1).fillna(1)
df = df.loc[df['Diff'] != 0, :].drop('Diff', axis = 1)
CodePudding user response:
Try:
Input_df.drop_duplicates(keep='last)