Home > other >  Python datetime on Futures data
Python datetime on Futures data

Time:11-24

I am trying to analyze Futures data using a data file I downloaded from Ninja Trader. I imported the file in Python using PyCharm IDE.

The text file format is this:

20211031 220000 0000000;4608;4608;4608;1

20211031 220000 0000000;4608;4608;4608;4

20211031 220000 0000000;4608;4608;4608;1

20211031 220000 0000000;4608;4608;4608;1

These are tick data with timestamp, price, price, price, volume.

I imported them using:

    data = pd.read_csv(r"C:\__dir__test.txt",sep=';')

(I have deleted the actual dir address)

Assigned column names:

    data.columns = ["date","P1","P2","P3","V"]

Then I used the pd.to_datetime unsuccessfully:

    data["date"] = pd.to_datetime(data ["date"])

and

    data['Dates'] = pd.to_datetime(data['date']).dt.date
    data['Time'] = pd.to_datetime(data['date']).dt.time

Then I am getting an error:

File "C:\..dir..venv\lib\site-packages\pandas\core\arrays\datetimes.py", line 2187, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))

File "pandas\_libs\tslibs\conversion.pyx", line 359, in pandas._libs.tslibs.conversion.datetime_to_datetime64

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred: Traceback (most recent call last):

File "<input>", line 1, in <module>

File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script

File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents "\n", file, 'exec'), glob, loc)

File "C:/__dir__/Data from NT.py", line 10, in <module>
data["date"] = pd.to_datetime(data ["date"])

File "C:\__dir__\venv\lib\site-packages\pandas\core\tools\datetimes.py", line 883, in to_datetime 
cache_array = _maybe_cache(arg, format, cache, convert_listlike)

File "C:\__dir__\venv\lib\site-packages\pandas\core\tools\datetimes.py", line 195, in _maybe_cache
cache_dates = convert_listlike(unique_dates, format)

File "C:\__dir__\venv\lib\site-packages\pandas\core\tools\datetimes.py", line 401, in _convert_listlike_datetimes
result, tz_parsed = objects_to_datetime64ns(

File "C:\__dir__\venv\lib\site-packages\pandas\core\arrays\datetimes.py", line 2193, in objects_to_datetime64ns
raise err

File "C:\__dir__\venv\lib\site-packages\pandas\core\arrays\datetimes.py", line 2175, in objects_to_datetime64ns result, tz_parsed = tslib.array_to_datetime(

File "pandas\_libs\tslib.pyx", line 379, in pandas._libs.tslib.array_to_datetime

File "pandas\_libs\tslib.pyx", line 611, in pandas._libs.tslib.array_to_datetime

File "pandas\_libs\tslib.pyx", line 749, in pandas._libs.tslib._array_to_datetime_object

File "pandas\_libs\tslib.pyx", line 740, in pandas._libs.tslib._array_to_datetime_object

File "pandas\_libs\tslibs\parsing.pyx", line 257, in pandas._libs.tslibs.parsing.parse_datetime_string

File "C:\__dir__\venv\lib\site-packages\dateutil\parser\_parser.py", line 1368, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)

File "C:\__dir__\venv\lib\site-packages\dateutil\parser\_parser.py", line 643, in parse
raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: 20211031 220000 0000000

The purpose is to split date and time to do analysis in specific time intervals (by slicing the data). I used the same set up in a data set I downloaded from Interactive Brokers using the ib_insync framework and worked fine.

CodePudding user response:

import pandas as pd

df = pd.DataFrame({"date_raw":["20211031 220000 0000000","20211031 220000 0000000", "20211031 220000 0000000", "20211031 220000 0000000"]})

# parse date_raw to datetime field
df["date"] = pd.to_datetime(df["date_raw"],format="%Y%m%d %H%M%S %f")

# get Date and Time fields
df['Date'] = df['date'].dt.date
df['Time'] = df['date'].dt.time

print(df)

Result:

                  date_raw                date        Date      Time
0  20211031 220000 0000000 2021-10-31 22:00:00  2021-10-31  22:00:00
1  20211031 220000 0000000 2021-10-31 22:00:00  2021-10-31  22:00:00
2  20211031 220000 0000000 2021-10-31 22:00:00  2021-10-31  22:00:00
3  20211031 220000 0000000 2021-10-31 22:00:00  2021-10-31  22:00:00
  • Related