I have a Pandas DataFrame as the following:
timestamp
1583985600000
1584072000000
1584331200000
1584417600000
1584504000000
1584590400000
There are actually other columns as well but I pasted the above one for the sake of simplicity.
I need to change this column to date format by creating a separate column within the same DataFrame. I try the following:
df["date EST"] = pd.to_datetime(agg_daily_df["timestamp"],
unit='ms').dt.tz_localize('EST').astype(str)
... which gives the following result:
date EST
2020-03-12 04:00:00-05:00
2020-03-13 04:00:00-05:00
2020-03-16 04:00:00-05:00
2020-03-17 04:00:00-05:00
2020-03-18 04:00:00-05:00
2020-03-19 04:00:00-05:00
... which looks quite strange to me. The first row should actually give
2020-03-12 00:00:00
.
What is it I am doing wrong here so that I get results in a strange format?
CodePudding user response:
This will return time in UTC as tz-naive datetime.
pd.to_datetime(agg_daily_df["timestamp"], unit='ms')
# 1583985600000 => 2020-03-12 04:00:00
So, localizing this datetime, it results in
original: 1583985600000 =>
pd.to_datetime: 2020-03-12 04:00:00 (tz-naive) =>
tz_localize: 2020-03-12 04:00:00-05:00 (tz-aware, EST)
The issue is that you need to have tz-aware datetime before converting to other timezone.
# Add utc=True to get tz-aware time and convert to EST
(pd.to_datetime(agg_daily_df["timestamp"], unit='ms', utc=True)
dt.tz_convert('EST'))
This way the time will be converted like this.
original: 1583985600000 =>
to_datetime with utc=True: 2020-03-12 04:00:00 00:00 (tz-aware, UTC) =>
tz_convert: 2020-03-11 23:00:00-05:00 (tz-aware, EST)
Note that "EST" timezone doesn't handle day light saving. If you would like to have the day light saving handling, use locale for timezone.
(pd.to_datetime(agg_daily_df["timestamp"], unit='ms', utc=True)
.dt.tz_convert('America/New_York'))
This will give you 2020-03-12 00:00:00-04:00
.
========================================================
Update:
If you would like to have tz-naive time again, remove tzinfo by tz_localize(None)
(pd.to_datetime(agg_daily_df["timestamp"], unit='ms', utc=True)
.dt.tz_convert('America/New_York')
.tz_localize(None))
Or if you are just want to have time without showing timezone offset, use strftime
to format datetime into string.
(pd.to_datetime(agg_daily_df["timestamp"], unit='ms', utc=True)
.dt.tz_convert('America/New_York')
.transform(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))