Use Pandas to convert column of non continuous dates to hourly values of those dates-CodePudding

I have a list of non-continuous dates. I'm reading these into a dataframe and would like convert to those dates to an hourly level.

All of my searching led to resampling. I managed to convert the date column to a DateTimeIndex and resample, however that filled in every single hour from the first date to the last, and I'm looking for just 24 hours at a time.

d = {'date': ["1/1/2014", "5/26/2014", "7/4/2014"]}
df = pd.DataFrame(data = d)
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.tz_localize('US/Eastern')
df = df.set_index('date')
df = df.resample('H').ffill()

This creates a dataframe with an index that contains every (4416 total) hour from 1/1/2014 - 7/4/2014

What I actually want is to have a dataframe that looks like this:

date
2014-01-01 00:00:00-05:00
2014-01-01 01:00:00-05:00
2014-01-01 02:00:00-05:00
2014-01-01 03:00:00-05:00
2014-01-01 04:00:00-05:00
2014-01-01 05:00:00-05:00
2014-01-01 06:00:00-05:00
2014-01-01 07:00:00-05:00
2014-01-01 08:00:00-05:00
2014-01-01 09:00:00-05:00
2014-01-01 10:00:00-05:00
2014-01-01 11:00:00-05:00
2014-01-01 12:00:00-05:00
2014-01-01 13:00:00-05:00
2014-01-01 14:00:00-05:00
2014-01-01 15:00:00-05:00
2014-01-01 16:00:00-05:00
2014-01-01 17:00:00-05:00
2014-01-01 18:00:00-05:00
2014-01-01 19:00:00-05:00
2014-01-01 20:00:00-05:00
2014-01-01 21:00:00-05:00
2014-01-01 22:00:00-05:00
2014-01-01 23:00:00-05:00
2014-05-26 00:00:00-04:00
2014-05-26 01:00:00-04:00
2014-05-26 02:00:00-04:00
2014-05-26 03:00:00-04:00
2014-05-26 04:00:00-04:00
2014-05-26 05:00:00-04:00
2014-05-26 06:00:00-04:00
2014-05-26 07:00:00-04:00
2014-05-26 08:00:00-04:00
2014-05-26 09:00:00-04:00
2014-05-26 10:00:00-04:00
2014-05-26 11:00:00-04:00
2014-05-26 12:00:00-04:00
2014-05-26 13:00:00-04:00
2014-05-26 14:00:00-04:00
2014-05-26 15:00:00-04:00
2014-05-26 16:00:00-04:00
2014-05-26 17:00:00-04:00
2014-05-26 18:00:00-04:00
2014-05-26 19:00:00-04:00
2014-05-26 20:00:00-04:00
2014-05-26 21:00:00-04:00
2014-05-26 22:00:00-04:00
2014-05-26 23:00:00-04:00

And then again skipping to 7/4 and just including the 24 hours of that date.

CodePudding user response：

Instead of resampling, create a date range of date to date 24h with a frequency of 1H for each date, and then explode them all together:

hours = df['date'].apply(lambda x: pd.date_range(x, x   pd.tseries.offsets.Hour(23), freq='1H')).explode()

Output:

>>> hours
0   2014-01-01 00:00:00-05:00
0   2014-01-01 01:00:00-05:00
0   2014-01-01 02:00:00-05:00
0   2014-01-01 03:00:00-05:00
0   2014-01-01 04:00:00-05:00
               ...           
2   2014-07-04 19:00:00-04:00
2   2014-07-04 20:00:00-04:00
2   2014-07-04 21:00:00-04:00
2   2014-07-04 22:00:00-04:00
2   2014-07-04 23:00:00-04:00
Name: date, Length: 72, dtype: datetime64[ns, US/Eastern]

CodePudding user response：

If I understand you, I think the good thing to use is pd.date_range from pandas package. You can do like this for the first date and concatenate it to implement the others (you can make it with a loop).

pd.date_range(start='1/1/2014', end = '2/1/2014', freq='1H')

CodePudding user response：

hours = df.apply(lambda df_: df_['date']   pd.to_timedelta(range(24), unit='H'), axis=1).explode()