local daylight saving time to standard UTC time using python or R-CodePudding

I have a timeseries file containing 10 years of data with daylight saving time. Time is in local time in naive format, location is St Louis, USA where multiple time zone comes in a year. A sample of the time series is here:

local_time          flow
11/3/12 23:30       58145400
11/4/12 0:00        58147200
11/4/12 0:30        58149000
11/4/12 1:00        58150800
11/4/12 1:30        58152600
11/4/12 1:00        58150800
11/4/12 1:30        58152600
11/4/12 2:00        58154400
11/4/12 2:30        58156200
11/4/12 3:00        58158000
11/4/12 3:30        58159800
11/4/12 4:00        58161600
11/4/12 4:30        58163400

if you see closely after 11/4/12 1:30 58152600 time becomes 11/4/12 1:00. It's a sunday and the clock goes back 1 hour.

If there were no daylight saving thing then the ts should have looked this below:

    local_time          flow
    11/3/2012 23:30     58145400
    11/4/12 0:00        58147200
    11/4/12 0:30        58149000
    11/4/12 1:00        58150800
    11/4/12 1:30        58152600
    11/4/12 2:30        58150800
    11/4/12 3:00        58152600
    11/4/12 3:30        58154400
    11/4/12 4:00        58156200
    11/4/12 4:30        58158000
    11/4/12 5:30        58159800
    11/4/12 6:00        58161600
    11/4/12 6:30        58163400

Now, there are several instances like this in my original file. I want to convert the local data into UTC or CST where there will be no daylight saving time jump like the local time series data.

I tried this:

import pandas as pd
import numpy as np
df=pd.read_excel(r'test_dst.xlsx, sheet_name='Sheet1', header=0)
ts_naive=df.iloc[:,0]  
ts_cst = ts_naive.dt.tz_localize('America/Chicago') # 'America/Chicago' uses CDT

but it gives an error: AmbiguousTimeError: Cannot infer dst time from 2012-11-04 01:00:00, try using the 'ambiguous' argument

If I use the following it gives me wrong output:

ts_cst = ts_naive.dt.tz_localize('UTC').dt.tz_convert('America/Chicago')

because I am assigning 'UTC' time zone to a local data which is wrong.

My ultimate goal is to remove the daylight saving timejump from the timeseries so that I can convert it into an ever increasing ts in seconds. My model can only take time in julian seconds and time series can only increase. Thanks. Here is a sample excel file: test_dst.xlsx

CodePudding user response：

There's a useful section on this in the documentation, specifically the ambiguous="infer" argument~

df.local_time = pd.to_datetime(df.local_time) 
df.local_time = df.local_time.dt.tz_localize('America/Chicago', 'infer')
print(df.local_time)
print(df.local_time.dt.tz_convert("UTC"))

Output:

0    2012-11-03 23:30:00-05:00
1    2012-11-04 00:00:00-05:00
2    2012-11-04 00:30:00-05:00
3    2012-11-04 01:00:00-05:00
4    2012-11-04 01:30:00-05:00
5    2012-11-04 01:00:00-06:00
6    2012-11-04 01:30:00-06:00
7    2012-11-04 02:00:00-06:00
8    2012-11-04 02:30:00-06:00
9    2012-11-04 03:00:00-06:00
10   2012-11-04 03:30:00-06:00
11   2012-11-04 04:00:00-06:00
12   2012-11-04 04:30:00-06:00
Name: local_time, dtype: datetime64[ns, America/Chicago]

0    2012-11-04 04:30:00 00:00
1    2012-11-04 05:00:00 00:00
2    2012-11-04 05:30:00 00:00
3    2012-11-04 06:00:00 00:00
4    2012-11-04 06:30:00 00:00
5    2012-11-04 07:00:00 00:00
6    2012-11-04 07:30:00 00:00
7    2012-11-04 08:00:00 00:00
8    2012-11-04 08:30:00 00:00
9    2012-11-04 09:00:00 00:00
10   2012-11-04 09:30:00 00:00
11   2012-11-04 10:00:00 00:00
12   2012-11-04 10:30:00 00:00
Name: local_time, dtype: datetime64[ns, UTC]