I received a data export from an unspecified source system, which includes dates in the format of:
- 52164-01-19 00:00:00 00
- 52992-08-12 04:29:36 00
- 52838-10-19 04:08:32.999936 00
- 54022-03-12 17:20:36.999936 00
I was told that the error is caused by a faulty conversion of unix to datetime (seconds and milliseconds).
We came up with a possible approach to refactor the date in python, by separating the year into a "normal" year (2164) and convert the rest into milliseconds.
import time
import math
d0 = '52164-01-19 00:00:00 00'
d0_y = 52164
multiplier = 5
# avg gregorian year in seconds
avg_greg = (365.2425 * 24 * 3600)
d1_modulus = 52164 % (1000 * multiplier)
d1_rest = d0_y - d1_modulus
# replace orginal year with modulus
d1_time = time.strptime(str(d1_modulus) '-10-19 04:08:32', '%Y-%m-%d %H:%M:%S')
#convert to milliseconds and add d1_rest in "seconds"
bigtime = time.mktime(d1_time) (avg_greg * d1_rest) #in milliseconds
biggertime = bigtime / 1000 # in seconds
finaltime = time.ctime(biggertime)
# finaltime = 'Thu Mar 12 07:34:41 2020'
print(finaltime)
This code can break for different dates and result in multiplier/modulus combinations to create out of range values for time.mktime.
Can someone suggest an alternative or better approach to this?
Thanks in advance Gabe
CodePudding user response:
Python datetime only supports between years 1 and 9999.
So I installed astropy
and it works fine :
import datetime
import re
import astropy.time as astropy_time # installed with PIP
faulty_data = "52164-01-19 00:00:00 00"
timeformat = re.compile(r"(?P<year>\d{5})-(?P<month>\d{2})-(?P<day>\d{2}) (?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2})(?P<sign_tz>[ \- ])(?P<hour_tz>\d{2})")
match = timeformat.fullmatch(faulty_data)
assert match
assert len(match.group("year")) == 5
assert match.group("hour_tz") == "00"
missing_thousand_years = int(match.group("year")[0])
time = astropy_time.Time({"year": int(match.group("year")),
"month": int(match.group("month")),
"day": int(match.group("day")),
"hour": int(match.group("hour")),
"minute": int(match.group("minute")),
"second": int(match.group("second"))
},
scale="utc")
print(time)
milliseconds = time.unix
print(milliseconds)
actual_datetime = datetime.datetime.fromtimestamp(milliseconds / 1000)
print(actual_datetime)
(52164, 1, 19, 0, 0, 0.)
1583971200000.0
2020-03-12 01:00:00
So it seems that the original date was 2020-03-12 01:00:00
, which is close to what you got with your method.
NB: it raises two warnings, that you can silence