Home > OS >  In pandas can I create empty rows for time series?
In pandas can I create empty rows for time series?

Time:07-20

It's been a while and I'm not sure how to do this in pandas, say I have a timeseries like this:

# creating time series
import pandas as pd

df = pd.DataFrame({'a': [0,1,2], 'date':[pd.to_datetime('2017-04-01'),
                                       pd.to_datetime('2020-04-01'),
                                       pd.to_datetime('2022-04-01')]})    
s = df.set_index('date')['a']
s
date
2017-04-01    0
2020-04-01    1
2022-04-01    2
Name: a, dtype: int64

I would like to expand the dataset so fill in daily values from the start to 2050 so I can do interpolates on it.

I know I can do s.interpolate(method='linear') but I believe I need the rows with nan's in it to do so.

How can I extend the dataframe?

CodePudding user response:

It's probably easier to create the full date range and join your values rather than trying to impute rows. We can use the pd.date_range() function to create a datetime index with the range you want and convert it to a dataframe, so we can just the join function, which combines dataframes by index.

import pandas as pd

df = pd.DataFrame({'a': [0,1,2], 'date':[pd.to_datetime('2017-04-01'),
                                         pd.to_datetime('2020-04-01'),
                                         pd.to_datetime('2022-04-01')]})    
s = df.set_index('date')['a']

full_date_df = pd.date_range(start='2017-04-01', end='2050-12-31').to_frame()
df_w_nulls = full_date_df.join(s)
df_w_nulls

CodePudding user response:

You can try

date = pd.date_range(start='2017-04-01', end='2050-04-01', freq='AS-APR')
s = s.reindex(date).fillna(0)
print(s)

2017-04-01    0.0
2018-04-01    0.0
2019-04-01    0.0
2020-04-01    1.0
2021-04-01    0.0
2022-04-01    2.0
2023-04-01    0.0
2024-04-01    0.0
2025-04-01    0.0
2026-04-01    0.0
2027-04-01    0.0
2028-04-01    0.0
2029-04-01    0.0
2030-04-01    0.0
2031-04-01    0.0
2032-04-01    0.0
2033-04-01    0.0
2034-04-01    0.0
2035-04-01    0.0
2036-04-01    0.0
2037-04-01    0.0
2038-04-01    0.0
2039-04-01    0.0
2040-04-01    0.0
2041-04-01    0.0
2042-04-01    0.0
2043-04-01    0.0
2044-04-01    0.0
2045-04-01    0.0
2046-04-01    0.0
2047-04-01    0.0
2048-04-01    0.0
2049-04-01    0.0
2050-04-01    0.0
Freq: AS-APR, Name: a, dtype: float64

CodePudding user response:

I recommend you to create a new dataframe that has the dates that you are looking for. One way to do it:

from pandas import Timestamp
start = Timestamp('2017-04-01')
end = Timestamp('2050-01-01')

all_dates = pd.date_range(start=start, end=end, freq='D')

# You can fill the "Value" column with any value you want. Here I filled with 0.
df = pd.DataFrame({'Date':all_dates , 'Value':[0]*len(all_dates)})
  • Related