Home > Blockchain >  Use Pandas to convert column of non continuous dates to hourly values of those dates
Use Pandas to convert column of non continuous dates to hourly values of those dates

Time:06-07

I have a list of non-continuous dates. I'm reading these into a dataframe and would like convert to those dates to an hourly level.

All of my searching led to resampling. I managed to convert the date column to a DateTimeIndex and resample, however that filled in every single hour from the first date to the last, and I'm looking for just 24 hours at a time.

d = {'date': ["1/1/2014", "5/26/2014", "7/4/2014"]}
df = pd.DataFrame(data = d)
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.tz_localize('US/Eastern')
df = df.set_index('date')
df = df.resample('H').ffill()

This creates a dataframe with an index that contains every (4416 total) hour from 1/1/2014 - 7/4/2014

What I actually want is to have a dataframe that looks like this:

date
2014-01-01 00:00:00-05:00
2014-01-01 01:00:00-05:00
2014-01-01 02:00:00-05:00
2014-01-01 03:00:00-05:00
2014-01-01 04:00:00-05:00
2014-01-01 05:00:00-05:00
2014-01-01 06:00:00-05:00
2014-01-01 07:00:00-05:00
2014-01-01 08:00:00-05:00
2014-01-01 09:00:00-05:00
2014-01-01 10:00:00-05:00
2014-01-01 11:00:00-05:00
2014-01-01 12:00:00-05:00
2014-01-01 13:00:00-05:00
2014-01-01 14:00:00-05:00
2014-01-01 15:00:00-05:00
2014-01-01 16:00:00-05:00
2014-01-01 17:00:00-05:00
2014-01-01 18:00:00-05:00
2014-01-01 19:00:00-05:00
2014-01-01 20:00:00-05:00
2014-01-01 21:00:00-05:00
2014-01-01 22:00:00-05:00
2014-01-01 23:00:00-05:00
2014-05-26 00:00:00-04:00
2014-05-26 01:00:00-04:00
2014-05-26 02:00:00-04:00
2014-05-26 03:00:00-04:00
2014-05-26 04:00:00-04:00
2014-05-26 05:00:00-04:00
2014-05-26 06:00:00-04:00
2014-05-26 07:00:00-04:00
2014-05-26 08:00:00-04:00
2014-05-26 09:00:00-04:00
2014-05-26 10:00:00-04:00
2014-05-26 11:00:00-04:00
2014-05-26 12:00:00-04:00
2014-05-26 13:00:00-04:00
2014-05-26 14:00:00-04:00
2014-05-26 15:00:00-04:00
2014-05-26 16:00:00-04:00
2014-05-26 17:00:00-04:00
2014-05-26 18:00:00-04:00
2014-05-26 19:00:00-04:00
2014-05-26 20:00:00-04:00
2014-05-26 21:00:00-04:00
2014-05-26 22:00:00-04:00
2014-05-26 23:00:00-04:00

And then again skipping to 7/4 and just including the 24 hours of that date.

CodePudding user response:

Instead of resampling, create a date range of date to date 24h with a frequency of 1H for each date, and then explode them all together:

hours = df['date'].apply(lambda x: pd.date_range(x, x   pd.tseries.offsets.Hour(23), freq='1H')).explode()

Output:

>>> hours
0   2014-01-01 00:00:00-05:00
0   2014-01-01 01:00:00-05:00
0   2014-01-01 02:00:00-05:00
0   2014-01-01 03:00:00-05:00
0   2014-01-01 04:00:00-05:00
               ...           
2   2014-07-04 19:00:00-04:00
2   2014-07-04 20:00:00-04:00
2   2014-07-04 21:00:00-04:00
2   2014-07-04 22:00:00-04:00
2   2014-07-04 23:00:00-04:00
Name: date, Length: 72, dtype: datetime64[ns, US/Eastern]

CodePudding user response:

If I understand you, I think the good thing to use is pd.date_range from pandas package. You can do like this for the first date and concatenate it to implement the others (you can make it with a loop).

pd.date_range(start='1/1/2014', end = '2/1/2014', freq='1H')

CodePudding user response:

hours = df.apply(lambda df_: df_['date']   pd.to_timedelta(range(24), unit='H'), axis=1).explode()
  • Related