Home > OS >  Pandas Resample creating new hourly data
Pandas Resample creating new hourly data

Time:10-22

I am using the resample funtion to go from minute data to hour data. The issue is my original DF has only from 10:30 to 15:59 data and the new resampled data is creating new hour data. How to I get rid of this data or have it resample only the time stamps on the index and not create new ones.

This is how the original DF looked: original  DF

This is how the resampled DF looks: enter image description here

My question is: how to I get ride of the other hour data from the resample and just include the previous hour data from the original DF?

Below is the code that I used to created the resampledDF

ROD['time'] = pd.to_datetime(ROD['timestamp'])
ROD.set_index('time', inplace = True, drop = True)
resampled = ROD.resample('60Min',origin='start').agg({'open':'first',
                                                      'high':'max',
                                                      'low': 'min',
                                                      'close': 'last', 'volume':'sum'})

Below is the output from printing resampled:

 open        high         low       close    volume
time                                                                         
2020-09-28 10:31:00  333.984985  334.470001  332.779999  333.750000   6482408
2020-09-28 11:31:00  333.760010  334.260010  333.109985  333.469910   4456465
2020-09-28 12:31:00  333.475006  334.500000  333.190002  334.239990   3711147
2020-09-28 13:31:00  334.239990  334.820007  334.174988  334.501099   4181924
2020-09-28 14:31:00  334.500000  334.959992  334.079987  334.600006   5698198
                         ...         ...         ...         ...       ...
2022-09-23 11:31:00  367.779999  368.170013  365.070007  365.119995   9603892
2022-09-23 12:31:00  365.109985  367.190002  364.825012  365.778412   9306106
2022-09-23 13:31:00  365.769989  366.649994  364.089996  364.829895   9172447
2022-09-23 14:31:00  364.820007  366.480011  363.290008  366.221405  14831712
2022-09-23 15:31:00  366.220001  368.040008  366.000000  367.440002  14253081

CodePudding user response:

Filter by time column is what you need? Try this;

begin_ts = '9/28/2020: 10:30' 
end_ts = '9/28/2020: 15:30'

df.set_index('time',inplace=True)
new_df = df.iloc[df.index.get_loc(begin_ts):df.index.get_loc(begin_ts)].copy()

Having said that, please consider indexing with a better index like timestamp.

CodePudding user response:

since the time period spans multiple days, it results in all hours of the day (24 for each day).

One way about it is to filter the result set for the time range using between_time

resampled.between_time('10:30:00', '13:59:00')
  • Related