I'm trying to parse the datetime specified in the OFX 2.3 spec in Python. I believe it's a custom format, but feel free to let me know if it has a name. The spec states the following:
There is one format for representing dates, times, and time zones. The complete form is: YYYYMMDDHHMMSS.XXX [gmt offset[:tz name]]
For example, “19961005132200.124[-5:EST]” represents October 5, 1996, at 1:22 and 124 milliseconds p.m., in Eastern Standard Time. This is the same as 6:22 p.m. Greenwich Mean Time (GMT).
Here is my current attempt:
from datetime import datetime
date_str = "19961005132200.124[EST]"
date = datetime.strptime(date_str, "%Y%m%d%H%M%S.%f[%Z]")
This partial example works so far, but is lacking the GMT offset portion (the -5 in [-5:EST]
). I'm not sure how to specify a time zone offset of at most two digits.
CodePudding user response:
Some things to note here, first (as commented):
- Python built-in strptime will have a hard time here -
%z
won't parse a single digit offset hour, and%Z
won't parse some (potentially) ambiguous time zone abbreviation.
Then, the OFX Banking Version 2.3 docs (sect. 3.2.8.2 Date and Datetime) leave some questions open to me:
- Is the UTC offset optional ?
- Why is EST called a time zone while it's just an abbreviation ?
- Why in the example the UTC offset is -5 hours while on 1996-10-05, US/Eastern was at UTC-4 ?
- What about offsets that have minutes specified, e.g. 5:30 for Asia/Calcutta ?
- (opinionated) Why re-invent the wheel in the first place instead of using a commonly used standard like ISO 8601 ?
Anyway, here's an attempt at a custom parser:
from datetime import datetime, timedelta, timezone
from zoneinfo import ZoneInfo
def parseOFXdatetime(s, tzinfos=None, _tz=None):
"""
parse OFX datetime string to an aware Python datetime object.
"""
# first, treat formats that have no UTC offset specified.
if not '[' in s:
# just make sure default format is satisfied by filling with zeros if needed
s = s.ljust(14, '0') '.000' if not '.' in s else s
return datetime.strptime(s, "%Y%m%d%H%M%S.%f").replace(tzinfo=timezone.utc)
# offset and tz are specified, so first get the date/time, offset and tzname components
s, off = s.strip(']').split('[')
off, name = off.split(':')
s = s.ljust(14, '0') '.000' if not '.' in s else s
# if tzinfos are specified, map the tz name:
if tzinfos:
_tz = tzinfos.get(name) # this might still leave _tz as None...
if not _tz: # ...so we derive a tz from a timedelta
_tz = timezone(timedelta(hours=int(off)), name=name)
return datetime.strptime(s, "%Y%m%d%H%M%S.%f").replace(tzinfo=_tz)
# some test strings
t = ["19961005132200.124[-5:EST]", "19961005132200.124", "199610051322", "19961005",
"199610051322[-5:EST]", "19961005[-5:EST]"]
for s in t:
print(# normal parsing
f'{s}\n {repr(parseOFXdatetime(s))}\n'
# parsing with tzinfo mapping supplied; abbreviation -> timezone object
f' {repr(parseOFXdatetime(s, tzinfos={"EST": ZoneInfo("US/Eastern")}))}\n\n')