Home > database >  Remove dips of data in Pandas dataframe
Remove dips of data in Pandas dataframe

Time:07-19

I have some time series data in a pandas dataframe that I know should always increase like but has some incorrect low values. Like below.

22-01-17   0
22-01-18   45
22-01-19   78
22-01-20   98
22-01-21   6            // bad
22-01-22   7            // bad
22-01-23   4            // bad
22-01-24   101

How can I remove regions of the data that are less that the previous good value.

I don't mind it we remove those values or replace with the last good value.

So using the example above how could I get

ie

22-01-17   0
22-01-18   45
22-01-19   78
22-01-20   98
22-01-21   98
22-01-22   98
22-01-23   98
22-01-24   101

or

22-01-17   0
22-01-18   45
22-01-19   78
22-01-20   98
22-01-21   NaN
22-01-22   NaN
22-01-23   NaN
22-01-24   101

Thanks

CodePudding user response:

Assuming s your Series.

To get the first option:

s.cummax()

output:

22-01-17      0
22-01-18     45
22-01-19     78
22-01-20     98
22-01-21     98
22-01-22     98
22-01-23     98
22-01-24    101
dtype: int64

for the second:

s.mask(s.lt(s.cummax()))

output:

22-01-17      0.0
22-01-18     45.0
22-01-19     78.0
22-01-20     98.0
22-01-21      NaN
22-01-22      NaN
22-01-23      NaN
22-01-24    101.0
dtype: float64
  • Related