I have a timeseries file containing 10 years of data with daylight saving time. Time is in local time in naive format, location is St Louis, USA where multiple time zone comes in a year. A sample of the time series is here:
local_time flow
11/3/12 23:30 58145400
11/4/12 0:00 58147200
11/4/12 0:30 58149000
11/4/12 1:00 58150800
11/4/12 1:30 58152600
11/4/12 1:00 58150800
11/4/12 1:30 58152600
11/4/12 2:00 58154400
11/4/12 2:30 58156200
11/4/12 3:00 58158000
11/4/12 3:30 58159800
11/4/12 4:00 58161600
11/4/12 4:30 58163400
if you see closely after 11/4/12 1:30 58152600
time becomes 11/4/12 1:00
. It's a sunday and the clock goes back 1 hour.
If there were no daylight saving thing then the ts should have looked this below:
local_time flow
11/3/2012 23:30 58145400
11/4/12 0:00 58147200
11/4/12 0:30 58149000
11/4/12 1:00 58150800
11/4/12 1:30 58152600
11/4/12 2:30 58150800
11/4/12 3:00 58152600
11/4/12 3:30 58154400
11/4/12 4:00 58156200
11/4/12 4:30 58158000
11/4/12 5:30 58159800
11/4/12 6:00 58161600
11/4/12 6:30 58163400
Now, there are several instances like this in my original file. I want to convert the local data into UTC or CST where there will be no daylight saving time jump like the local time series data.
I tried this:
import pandas as pd
import numpy as np
df=pd.read_excel(r'test_dst.xlsx, sheet_name='Sheet1', header=0)
ts_naive=df.iloc[:,0]
ts_cst = ts_naive.dt.tz_localize('America/Chicago') # 'America/Chicago' uses CDT
but it gives an error: AmbiguousTimeError: Cannot infer dst time from 2012-11-04 01:00:00, try using the 'ambiguous' argument
If I use the following it gives me wrong output:
ts_cst = ts_naive.dt.tz_localize('UTC').dt.tz_convert('America/Chicago')
because I am assigning 'UTC' time zone to a local data which is wrong.
My ultimate goal is to remove the daylight saving timejump from the timeseries so that I can convert it into an ever increasing ts in seconds. My model can only take time in julian seconds and time series can only increase. Thanks. Here is a sample excel file: test_dst.xlsx
CodePudding user response:
There's a useful section on this in the documentation, specifically the ambiguous="infer"
argument~
df.local_time = pd.to_datetime(df.local_time)
df.local_time = df.local_time.dt.tz_localize('America/Chicago', 'infer')
print(df.local_time)
print(df.local_time.dt.tz_convert("UTC"))
Output:
0 2012-11-03 23:30:00-05:00
1 2012-11-04 00:00:00-05:00
2 2012-11-04 00:30:00-05:00
3 2012-11-04 01:00:00-05:00
4 2012-11-04 01:30:00-05:00
5 2012-11-04 01:00:00-06:00
6 2012-11-04 01:30:00-06:00
7 2012-11-04 02:00:00-06:00
8 2012-11-04 02:30:00-06:00
9 2012-11-04 03:00:00-06:00
10 2012-11-04 03:30:00-06:00
11 2012-11-04 04:00:00-06:00
12 2012-11-04 04:30:00-06:00
Name: local_time, dtype: datetime64[ns, America/Chicago]
0 2012-11-04 04:30:00 00:00
1 2012-11-04 05:00:00 00:00
2 2012-11-04 05:30:00 00:00
3 2012-11-04 06:00:00 00:00
4 2012-11-04 06:30:00 00:00
5 2012-11-04 07:00:00 00:00
6 2012-11-04 07:30:00 00:00
7 2012-11-04 08:00:00 00:00
8 2012-11-04 08:30:00 00:00
9 2012-11-04 09:00:00 00:00
10 2012-11-04 09:30:00 00:00
11 2012-11-04 10:00:00 00:00
12 2012-11-04 10:30:00 00:00
Name: local_time, dtype: datetime64[ns, UTC]