I am using the resample funtion to go from minute data to hour data. The issue is my original DF has only from 10:30 to 15:59 data and the new resampled data is creating new hour data. How to I get rid of this data or have it resample only the time stamps on the index and not create new ones.
This is how the original DF looked:
This is how the resampled DF looks:
My question is: how to I get ride of the other hour data from the resample and just include the previous hour data from the original DF?
Below is the code that I used to created the resampledDF
ROD['time'] = pd.to_datetime(ROD['timestamp'])
ROD.set_index('time', inplace = True, drop = True)
resampled = ROD.resample('60Min',origin='start').agg({'open':'first',
'high':'max',
'low': 'min',
'close': 'last', 'volume':'sum'})
Below is the output from printing resampled:
open high low close volume
time
2020-09-28 10:31:00 333.984985 334.470001 332.779999 333.750000 6482408
2020-09-28 11:31:00 333.760010 334.260010 333.109985 333.469910 4456465
2020-09-28 12:31:00 333.475006 334.500000 333.190002 334.239990 3711147
2020-09-28 13:31:00 334.239990 334.820007 334.174988 334.501099 4181924
2020-09-28 14:31:00 334.500000 334.959992 334.079987 334.600006 5698198
... ... ... ... ...
2022-09-23 11:31:00 367.779999 368.170013 365.070007 365.119995 9603892
2022-09-23 12:31:00 365.109985 367.190002 364.825012 365.778412 9306106
2022-09-23 13:31:00 365.769989 366.649994 364.089996 364.829895 9172447
2022-09-23 14:31:00 364.820007 366.480011 363.290008 366.221405 14831712
2022-09-23 15:31:00 366.220001 368.040008 366.000000 367.440002 14253081
CodePudding user response:
Filter by time column is what you need? Try this;
begin_ts = '9/28/2020: 10:30'
end_ts = '9/28/2020: 15:30'
df.set_index('time',inplace=True)
new_df = df.iloc[df.index.get_loc(begin_ts):df.index.get_loc(begin_ts)].copy()
Having said that, please consider indexing with a better index like timestamp.
CodePudding user response:
since the time period spans multiple days, it results in all hours of the day (24 for each day).
One way about it is to filter the result set for the time range using between_time
resampled.between_time('10:30:00', '13:59:00')