I've created a class which takes in minute data and returns the daily ohlc for that day. A simple version looks like so:
import pandas as pd
from datetime import time
from IPython.display import display
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
US_BUSINESS_DAY = CustomBusinessDay(calendar=USFederalHolidayCalendar())
class SessionData:
def __init__(self, data, date):
self.data = pd.read_csv(data)
self.date = date
df = self.data
display(df)
# get the minute data and return only the specified date (2022-04-18)
df_current_day = df[(df['date'] >= date) & (df['date'] <= date)]
df_current_day['time'] = pd.to_datetime(df['time']).dt.time
self.previous_day = date - 2 * US_BUSINESS_DAY # 2018-7-2
# get the minute data aof previous trading day (2022-04-14)
df_previous_day = df[(df['date'] >= self.previous_day) & (df['date'] <= self.previous_day)]
Here's what my data originally looks like:
v vw o c h l t n date time
0 605.0 4.2036 4.2000 4.20 4.2000 4.20 2022-04-07 13:30:00 3 2022-04-07 13:30:00
1 809.0 4.2013 4.2026 4.20 4.2026 4.20 2022-04-07 13:41:00 12 2022-04-07 13:41:00
2 115.0 4.1739 4.1700 4.17 4.1700 4.17 2022-04-07 13:43:00 3 2022-04-07 13:43:00
3 170.0 4.1495 4.1500 4.15 4.1500 4.15 2022-04-07 13:53:00 6 2022-04-07 13:53:00
4 100.0 4.1600 4.1600 4.16 4.1600 4.16 2022-04-07 13:57:00 1 2022-04-07 13:57:00
... ... ... ... ... ... ... ... ... ... ...
1397 6260.0 6.5252 6.5300 6.53 6.5600 6.51 2022-04-18 23:55:00 32 2022-04-18 23:55:00
1398 8610.0 6.5399 6.5300 6.55 6.5500 6.52 2022-04-18 23:56:00 28 2022-04-18 23:56:00
1399 9035.0 6.5493 6.5500 6.55 6.5600 6.54 2022-04-18 23:57:00 24 2022-04-18 23:57:00
1400 30328.0 6.5188 6.5600 6.50 6.5600 6.50 2022-04-18 23:58:00 66 2022-04-18 23:58:00
1401 25403.0 6.5152 6.5000 6.52 6.5500 6.49 2022-04-18 23:59:00 62 2022-04-18 23:59:00
1402 rows × 10 columns
When I get the current date it's fine:
v vw o c h l t n date time
687 852.0 4.1498 3.98 4.41 4.4100 3.98 2022-04-18 12:00:00 13 2022-04-18 12:00:00
688 2901.0 4.4839 4.13 4.75 4.7500 4.13 2022-04-18 12:01:00 24 2022-04-18 12:01:00
689 44063.0 4.9450 4.88 4.66 5.2599 4.60 2022-04-18 12:02:00 236 2022-04-18 12:02:00
690 46314.0 4.6890 4.70 4.62 4.8000 4.55 2022-04-18 12:03:00 225 2022-04-18 12:03:00
691 142991.0 4.8611 4.66 5.03 5.0900 4.61 2022-04-18 12:04:00 581 2022-04-18 12:04:00
... ... ... ... ... ... ... ... ... ... ...
1397 6260.0 6.5252 6.53 6.53 6.5600 6.51 2022-04-18 23:55:00 32 2022-04-18 23:55:00
1398 8610.0 6.5399 6.53 6.55 6.5500 6.52 2022-04-18 23:56:00 28 2022-04-18 23:56:00
1399 9035.0 6.5493 6.55 6.55 6.5600 6.54 2022-04-18 23:57:00 24 2022-04-18 23:57:00
1400 30328.0 6.5188 6.56 6.50 6.5600 6.50 2022-04-18 23:58:00 66 2022-04-18 23:58:00
1401 25403.0 6.5152 6.50 6.52 6.5500 6.49 2022-04-18 23:59:00 62 2022-04-18 23:59:00
But when I go to create a new dataframe based off the previous business day I get the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-346-dd4f43872f9b> in <module>
182 # # print(pm)
183
--> 184 pmData = SessionData('data\CASA.csv', '2022-04-18')
185
186 # pmData.get_trading_session_times()
<ipython-input-346-dd4f43872f9b> in __init__(self, data, date)
50 display(df_current_day)
51
---> 52 self.previous_day = date - 2 * US_BUSINESS_DAY # 2018-7-2
53 # get the minute data aof previous trading day (2022-04-14)
54 df_previous_day = df[(df['date'] >= self.previous_day) & (df['date'] <= self.previous_day)]
TypeError: unsupported operand type(s) for -: 'str' and 'pandas._libs.tslibs.offsets.CustomBusinessDay'
How do I fix this?
Update
Types:
<class 'pandas._libs.tslibs.offsets.CustomBusinessDay'> US_BUSINESS_DAY type
<class 'pandas._libs.tslibs.timestamps.Timestamp'> self.date type
CodePudding user response:
try to put date to this format : datetime64 :
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
US_BUSINESS_DAY = CustomBusinessDay(calendar=USFederalHolidayCalendar())
date='2018-7-2'
pd.to_datetime(date) - 2 * US_BUSINESS_DAY # 2018-7-2
Timestamp('2018-06-28 00:00:00')