I have a data set with statistics that I collect from text. The processing method sometimes does not work correctly, and I need to correct the output data. I know they are supposed to be cumulative, but sometimes I get incorrect data.
Time series data that should accumulate over time. Right now I'm getting the following, sample snippet:
df
date value
2021-07-20 21347.0
2021-07-24 21739.0
2021-08-02 22.0
2021-08-03 22.0
2021-08-06 22947.0
2021-08-17 4.0
As you can see, the data is cumulative, but some values are defined incorrectly.
I would like such values to be converted to nan
.
How can I do that? The final result is expected to be as follows:
df
date value
2021-07-20 21347.0
2021-07-24 21739.0
2021-08-02 nan
2021-08-03 nan
2021-08-06 22947.0
2021-08-17 nan
CodePudding user response:
You can do that using numpy
:
df['value'] = np.where(df['value'] < df['value'][0], np.nan, df['value'])
Output:
date value
0 2021-07-20 21347.0
1 2021-07-24 21739.0
2 2021-08-02 nan
3 2021-08-03 nan
4 2021-08-06 22947.0
5 2021-08-17 nan
CodePudding user response:
Can you try this:
import numpy as np
df['check']=df['value'].shift(1)
df['value']=np.where(df['value']>df['check'],df['value'],np.nan)