Home > Software design >  Cannot convert timestamp to date in Python with correct timezone
Cannot convert timestamp to date in Python with correct timezone

Time:12-14

I have a Pandas DataFrame as the following:

timestamp
1583985600000
1584072000000
1584331200000
1584417600000
1584504000000
1584590400000

There are actually other columns as well but I pasted the above one for the sake of simplicity.

I need to change this column to date format by creating a separate column within the same DataFrame. I try the following:

df["date EST"] = pd.to_datetime(agg_daily_df["timestamp"],
                    unit='ms').dt.tz_localize('EST').astype(str)

... which gives the following result:

date EST
2020-03-12 04:00:00-05:00
2020-03-13 04:00:00-05:00
2020-03-16 04:00:00-05:00
2020-03-17 04:00:00-05:00
2020-03-18 04:00:00-05:00
2020-03-19 04:00:00-05:00

... which looks quite strange to me. The first row should actually give 2020-03-12 00:00:00.

What is it I am doing wrong here so that I get results in a strange format?

CodePudding user response:

This will return time in UTC as tz-naive datetime.

pd.to_datetime(agg_daily_df["timestamp"], unit='ms')

# 1583985600000 => 2020-03-12 04:00:00

So, localizing this datetime, it results in

original:       1583985600000 => 
pd.to_datetime: 2020-03-12 04:00:00 (tz-naive) => 
tz_localize:    2020-03-12 04:00:00-05:00 (tz-aware, EST)

The issue is that you need to have tz-aware datetime before converting to other timezone.

# Add utc=True to get tz-aware time and convert to EST
(pd.to_datetime(agg_daily_df["timestamp"], unit='ms', utc=True)
   dt.tz_convert('EST'))

This way the time will be converted like this.

original:                  1583985600000 => 
to_datetime with utc=True: 2020-03-12 04:00:00 00:00 (tz-aware, UTC) => 
tz_convert:                2020-03-11 23:00:00-05:00 (tz-aware, EST)

Note that "EST" timezone doesn't handle day light saving. If you would like to have the day light saving handling, use locale for timezone.

(pd.to_datetime(agg_daily_df["timestamp"], unit='ms', utc=True)
   .dt.tz_convert('America/New_York'))

This will give you 2020-03-12 00:00:00-04:00.

========================================================

Update:

If you would like to have tz-naive time again, remove tzinfo by tz_localize(None)

 (pd.to_datetime(agg_daily_df["timestamp"], unit='ms', utc=True)
   .dt.tz_convert('America/New_York')
   .tz_localize(None))

Or if you are just want to have time without showing timezone offset, use strftime to format datetime into string.

(pd.to_datetime(agg_daily_df["timestamp"], unit='ms', utc=True)
   .dt.tz_convert('America/New_York')
   .transform(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))
  • Related