Home > other >  How to properly convert a UNIX timestamp to pd.Timestamp object via pandas?
How to properly convert a UNIX timestamp to pd.Timestamp object via pandas?

Time:12-24

I found an inconsistency in how pandas is converting UNIX timestamps to python datetime objects:

d = datetime.datetime.utcnow()

timestamp = d.timestamp()

assert datetime.datetime.fromtimestamp(timestamp) == d

assert pd.to_datetime(timestamp, unit="s").to_pydatetime() == d

The first assertion is correct, while the second fails. Pandas is converting the UTC timestamp into my local timezone.

It's hard to believe that this is a bug, so what am I doing wrong?

Thanks!

CodePudding user response:

Problem is quite simple but not obvious. utcnow() gives you a naive datetime object, meaning that it is not aware of the fact that it represents UTC. Therefor, once you call .timestamp(), Python assumes local time because the datetime object is naive! Thus converts to UTC first before calculating Unix time, adding any UTC offset that your local tz might have.

Solution: construct a datetime object that is aware of UTC. Same goes for fromtimestamp: set UTC as tz !

from datetime import datetime, timezone
import pandas as pd

d = datetime.now(timezone.utc)
timestamp = d.timestamp()

assert datetime.fromtimestamp(timestamp, tz=timezone.utc) == d
assert pd.to_datetime(timestamp, unit="s", utc=True).to_pydatetime() == d

pandas is kind of a different story; naive datetime is treated internally as UTC, so pd.to_datetime(timestamp, unit="s") gives you the UTC timestamp. But the conversion to Python datetime does not take into account that Python will treat it as local time again... Here, keeping it consistent and setting utc=True (i.e. using an aware Timestamp) makes it work nicely.

  • Related