Parse unix time with pd.to_datetime and datetime.datetime.fromtimestamp-CodePudding

I am trying to parse a Unix timestamp using pd.to_datetime() vs. dt.datetime.fromtimestamp(), but their outputs are different. Which one is correct?

import datetime as dt
import pandas as pd

ts = 1674853200000
print(pd.to_datetime(ts, unit='ms'))
print(dt.datetime.fromtimestamp(ts / 1e3))

>> 2023-01-27 21:00:00
>> 2023-01-27 13:00:00

CodePudding user response：

Both are correct. The main difference between them is that pd.to_datetime() is more flexible and can handle missing input data, while dt.datetime.fromtimestamp() assumes the input timestamp is in the local time zone. Generally, the choice of which one to use depends on the requirements of your use-case.

CodePudding user response：

In contrast to pandas (numpy) datetime, vanilla Python datetime defaults to local time if you to not specify a time zone or UTC (= use naive datetime). Here's an illustration. If I reproduce your example in my Python environment, I get

from datetime import datetime, timezone
import pandas as pd

# ms since the Unix epoch, 1970-01-01 00:00 UTC
unix = 1674853200000 

dt_py = datetime.fromtimestamp(unix/1e3)
dt_pd = pd.to_datetime(unix, unit="ms")

print(dt_py, dt_pd)
# 2023-01-27 22:00:00 # from fromtimestamp
# 2023-01-27 21:00:00 # from pd.to_datetime

Comparing the datetime objects with my local time UTC offset, there's the difference:

# my UTC offset at that point in time:
print(dt_py.astimezone().utcoffset())
# 1:00:00

# difference between dt_py and dt_pd:
print(dt_py-dt_pd)
# 0 days 01:00:00

To get consistent results between pandas and vanilla Python, i.e. avoid the ambiguity, you can use aware datetime:

dt_py = datetime.fromtimestamp(unix/1e3, tz=timezone.utc)
dt_pd = pd.to_datetime(unix, unit="ms", utc=True)

print(dt_py, dt_pd)
# 2023-01-27 21:00:00 00:00 
# 2023-01-27 21:00:00 00:00

print(dt_py-dt_pd)
# 0 days 00:00:00