I have a times series data frame with some missing values in the price column and I want to replace them with week ahead prices. The missing values are in the range from 2015-01-01 to 2015-01-07.
First I've tried to separate the rows with the missing values in the date range via df.loc[]
.
Sorry if my code is total crap. I'm pretty new to python and struggling.
import pandas as pd
import numpy as np
time = pd.date_range(start='2015-01-01', end='2015-01-20', freq='D')
sampl = np.random.uniform(low=-4.5, high=34.0, size=(20,))
df = pd.DataFrame({"prices":
sampl}, index = time)
df = df.replace([df.loc['2015-01-01':'2015-01-07']],['NaN'])
df
So I tried
df = df.replace([df.loc['2015-01-01':'2015-01-07']],[ df.loc['2015-01-08':'2015-01-14']])
to replace the missing values with the values of the week ahead. But it doesn't work quite well. Here I created an time series as an example. My original series has hourly data from 2015-01-01 00:00:00 to 2015-12-31 23:00:00 and the missing values are located in the range from 2015-01-01 to 2015-01-04.
I hope someone has a better approach.
CodePudding user response:
Could shift the data backwards?
import pandas as pd
import numpy as np
time = pd.date_range(start='2015-01-01', end='2015-01-20', freq='D')
sampl = np.random.uniform(low=-4.5, high=34.0, size=(20,))
df = pd.DataFrame({"prices":
sampl}, index = time)
df = df.replace([df.loc['2015-01-01':'2015-01-07']],['NaN'])
new_data = df.shift(-7).loc["2015-01-01":"2015-01-07"]
df.loc["2015-01-01":"2015-01-07"] = new_data
df
Example out