I am looking to take annual population data and interpolate it into an hourly time series. I am trying to create a function which produces a time series for each unique name of the hourly population for the sample years given. I have included the code below as well as example data:
import pandas as pd
import random
from scipy.interpolate import interp1d
name = ['RI', 'NH', 'MA', 'RI', 'NH', 'MA','RI', 'NH', 'MA','RI', 'NH', 'MA']
year = [2015, 2015, 2015, 2016, 2016, 2016, 2017, 2017, 2017, 2018, 2018, 2018]
population = random.sample(range(10000, 300000), 12)
df_pop = pd.DataFrame(list(zip(name, year, population)))
start_year = 2015
end_year = 2018
def pop_sum(df_pop, start_year, end_year):
names = df_pop['name'].unique()
df = pd.DataFrame([])
for i in names):
t = df_pop['year']
y1 = df_pop['population']
x = pd.DataFrame({'Hours': pd.date_range(f'{start_year}-01-01', f'{end_year}-12-31',
freq='1H', closed='left')})
pop_interp = interp1d(t, y1, x, 'linear')
df = df.append(pop_interp)
return df
This script does not work however and cannot loop over name. I tried looking for resources online but converting from annual to hourly timeseries is far less common than say hourly to annual. I have tried scipy's interp1d but I am open to suggestions of an other packages that may also do the same job. Thank you in advance for you suggestions.
CodePudding user response:
I notice that even though you are looping through an array of names, you are not using the name in the action of the loop. So, you say for i in names
but you don't use i
in your loop. Because of this, each iteration of your loop will produce the same result as the last, because there is no use of a variable to change the outcome of the iteration.
Since you are appending each iteration to the bottom of an new dataframe, all results will be in the same columns. So, what you can do is pull out a small dataframe for each name, and then use THAT data to do the calculation. You also want to either make the index be the name, or add a column called 'name' for the final dataframe.
Something like
names = df_pop['name'].unique()
df = pd.DataFrame(columns = ['name', 'function'])
for i in names:
condition = df_pop['name'].str.match(i) # define condition where name is i
mini_df = df_pop[condition] # all rows where condition is met
t = mini_df['year']
y1 = mini_df['population']
x = pd.DataFrame({'Hours': pd.date_range(f'{start_year}-01-01', f'{end_year}-12-31', freq='1H', closed='left')})
pop_interp = interp1d(t, y1, x, 'linear')
new_row = {name: i, function: pop_interp} # make a new row to append
df = df.append(new_row, ignore_index = True) # append it
Assuming the interp1d
is what you want - i'm not that familiar with it - I think this structure will work better to get unique results for each name.
CodePudding user response:
You can convert year to datetime, set it as the index, reindex to hourly frequency, and interpolate using