I have some time series data in a pandas dataframe that I know should always increase like but has some incorrect low values. Like below.
22-01-17 0
22-01-18 45
22-01-19 78
22-01-20 98
22-01-21 6 // bad
22-01-22 7 // bad
22-01-23 4 // bad
22-01-24 101
How can I remove regions of the data that are less that the previous good value.
I don't mind it we remove those values or replace with the last good value.
So using the example above how could I get
ie
22-01-17 0
22-01-18 45
22-01-19 78
22-01-20 98
22-01-21 98
22-01-22 98
22-01-23 98
22-01-24 101
or
22-01-17 0
22-01-18 45
22-01-19 78
22-01-20 98
22-01-21 NaN
22-01-22 NaN
22-01-23 NaN
22-01-24 101
Thanks
CodePudding user response:
Assuming s
your Series.
To get the first option:
s.cummax()
output:
22-01-17 0
22-01-18 45
22-01-19 78
22-01-20 98
22-01-21 98
22-01-22 98
22-01-23 98
22-01-24 101
dtype: int64
for the second:
s.mask(s.lt(s.cummax()))
output:
22-01-17 0.0
22-01-18 45.0
22-01-19 78.0
22-01-20 98.0
22-01-21 NaN
22-01-22 NaN
22-01-23 NaN
22-01-24 101.0
dtype: float64