Home > Enterprise >  time series - replacing missing values with weak ahead data
time series - replacing missing values with weak ahead data

Time:11-11

I have a times series data frame with some missing values in the price column and I want to replace them with week ahead prices. The missing values are in the range from 2015-01-01 to 2015-01-07.

First I've tried to separate the rows with the missing values in the date range via df.loc[].

Sorry if my code is total crap. I'm pretty new to python and struggling.

import pandas as pd
import numpy as np
time = pd.date_range(start='2015-01-01', end='2015-01-20', freq='D')
sampl = np.random.uniform(low=-4.5, high=34.0, size=(20,))
df = pd.DataFrame({"prices": 
                   sampl}, index = time)
df = df.replace([df.loc['2015-01-01':'2015-01-07']],['NaN'])
df

So I tried

df = df.replace([df.loc['2015-01-01':'2015-01-07']],[ df.loc['2015-01-08':'2015-01-14']])

to replace the missing values with the values of the week ahead. But it doesn't work quite well. Here I created an time series as an example. My original series has hourly data from 2015-01-01 00:00:00 to 2015-12-31 23:00:00 and the missing values are located in the range from 2015-01-01 to 2015-01-04.

I hope someone has a better approach.

CodePudding user response:

Could shift the data backwards?

import pandas as pd
import numpy as np
time = pd.date_range(start='2015-01-01', end='2015-01-20', freq='D')
sampl = np.random.uniform(low=-4.5, high=34.0, size=(20,))
df = pd.DataFrame({"prices": 
                   sampl}, index = time)
df = df.replace([df.loc['2015-01-01':'2015-01-07']],['NaN'])


new_data = df.shift(-7).loc["2015-01-01":"2015-01-07"]

df.loc["2015-01-01":"2015-01-07"] = new_data

df

Example out

output

  • Related