Home > database >  Need to replace Nan values of a timeseries dataframe with logic
Need to replace Nan values of a timeseries dataframe with logic

Time:08-13

df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000','3/13/2000','3/14/2000','3/15/2000','3/16/2000','3/17/2000','3/18/2000'],
                   'value': [2,NaN,NaN,NaN,NaN,NaN,NaN,NaN,25]})

In this dataframe, I want to replace the Nan values as with the following logic: In this case the difference between two dates in terms of days when the value column is not Nan is 8 days i.e. 3/18/2000 - 3/10/2000 = 8 days. And lets say the delta = 23 which we get from subtracting 25-2. I want to replace the Nan values for all the other t day as 2 (delta)*(t/8) where t is any day with a nan value between the given two non nan value My desired outcome of value column is : [2,4.875,7.75,10.625,13.5,16.375,19.25,22.125,25]

CodePudding user response:

You can set the date to timedelta, then as index and interpolate with the 'index' method:

df['value'] = (df
 .assign(date=pd.to_datetime(df['date']))
 .set_index('date')['value']
 .interpolate('index')
 .values
)

output:

        date   value
0  3/10/2000   2.000
1  3/11/2000   4.875
2  3/12/2000   7.750
3  3/13/2000  10.625
4  3/14/2000  13.500
5  3/15/2000  16.375
6  3/16/2000  19.250
7  3/17/2000  22.125
8  3/18/2000  25.000
  • Related