I have a list of non-continuous dates. I'm reading these into a dataframe and would like convert to those dates to an hourly level.
All of my searching led to resampling. I managed to convert the date column to a DateTimeIndex and resample, however that filled in every single hour from the first date to the last, and I'm looking for just 24 hours at a time.
d = {'date': ["1/1/2014", "5/26/2014", "7/4/2014"]}
df = pd.DataFrame(data = d)
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.tz_localize('US/Eastern')
df = df.set_index('date')
df = df.resample('H').ffill()
This creates a dataframe with an index that contains every (4416 total) hour from 1/1/2014 - 7/4/2014
What I actually want is to have a dataframe that looks like this:
date
2014-01-01 00:00:00-05:00
2014-01-01 01:00:00-05:00
2014-01-01 02:00:00-05:00
2014-01-01 03:00:00-05:00
2014-01-01 04:00:00-05:00
2014-01-01 05:00:00-05:00
2014-01-01 06:00:00-05:00
2014-01-01 07:00:00-05:00
2014-01-01 08:00:00-05:00
2014-01-01 09:00:00-05:00
2014-01-01 10:00:00-05:00
2014-01-01 11:00:00-05:00
2014-01-01 12:00:00-05:00
2014-01-01 13:00:00-05:00
2014-01-01 14:00:00-05:00
2014-01-01 15:00:00-05:00
2014-01-01 16:00:00-05:00
2014-01-01 17:00:00-05:00
2014-01-01 18:00:00-05:00
2014-01-01 19:00:00-05:00
2014-01-01 20:00:00-05:00
2014-01-01 21:00:00-05:00
2014-01-01 22:00:00-05:00
2014-01-01 23:00:00-05:00
2014-05-26 00:00:00-04:00
2014-05-26 01:00:00-04:00
2014-05-26 02:00:00-04:00
2014-05-26 03:00:00-04:00
2014-05-26 04:00:00-04:00
2014-05-26 05:00:00-04:00
2014-05-26 06:00:00-04:00
2014-05-26 07:00:00-04:00
2014-05-26 08:00:00-04:00
2014-05-26 09:00:00-04:00
2014-05-26 10:00:00-04:00
2014-05-26 11:00:00-04:00
2014-05-26 12:00:00-04:00
2014-05-26 13:00:00-04:00
2014-05-26 14:00:00-04:00
2014-05-26 15:00:00-04:00
2014-05-26 16:00:00-04:00
2014-05-26 17:00:00-04:00
2014-05-26 18:00:00-04:00
2014-05-26 19:00:00-04:00
2014-05-26 20:00:00-04:00
2014-05-26 21:00:00-04:00
2014-05-26 22:00:00-04:00
2014-05-26 23:00:00-04:00
And then again skipping to 7/4 and just including the 24 hours of that date.
CodePudding user response:
Instead of resampling, create a date range of date
to date 24h
with a frequency of 1H
for each date, and then explode them all together:
hours = df['date'].apply(lambda x: pd.date_range(x, x pd.tseries.offsets.Hour(23), freq='1H')).explode()
Output:
>>> hours
0 2014-01-01 00:00:00-05:00
0 2014-01-01 01:00:00-05:00
0 2014-01-01 02:00:00-05:00
0 2014-01-01 03:00:00-05:00
0 2014-01-01 04:00:00-05:00
...
2 2014-07-04 19:00:00-04:00
2 2014-07-04 20:00:00-04:00
2 2014-07-04 21:00:00-04:00
2 2014-07-04 22:00:00-04:00
2 2014-07-04 23:00:00-04:00
Name: date, Length: 72, dtype: datetime64[ns, US/Eastern]
CodePudding user response:
If I understand you, I think the good thing to use is pd.date_range
from pandas package.
You can do like this for the first date and concatenate it to implement the others (you can make it with a loop).
pd.date_range(start='1/1/2014', end = '2/1/2014', freq='1H')
CodePudding user response:
hours = df.apply(lambda df_: df_['date'] pd.to_timedelta(range(24), unit='H'), axis=1).explode()