Home > Blockchain >  How to use Pandas Timestamp fold argument?
How to use Pandas Timestamp fold argument?

Time:03-29

When playing around with timezone conversions and dst impact, I have a hard time figuring out Pandas implementation of the fold parameter of the Timestamp constructor. The documentation mentions:

Due to daylight saving time, one wall clock time can occur twice when shifting from summer to winter time; fold describes whether the datetime-like corresponds to the first (0) or the second time (1) the wall clock hits the ambiguous time.

So far no surprise, but when I run the following code:

import pandas as pd
from datetime import datetime

pre_fold = pd.Timestamp(datetime(2022,10,30,1,30,0), tz="CET")
in_fold_fold0 = pd.Timestamp(datetime(2022,10,30,2,30,0), tz="CET")
in_fold_fold1 = pd.Timestamp(datetime(2022,10,30,2,30,0), tz="CET", fold=1)
post_fold = pd.Timestamp(datetime(2022,10,30,3,30,0), tz="CET")

print(f"fold0: {in_fold_fold0.fold}")
print(f"fold1: {in_fold_fold1.fold}")

print(f"Pre CET fold:       {pre_fold}  ->  UTC {pre_fold.tz_convert(tz='UTC')}")
print(f"In CET fold, fold0: {in_fold_fold0}  ->  UTC {in_fold_fold0.tz_convert(tz='UTC')}")
print(f"In CET fold, fold1: {in_fold_fold1}  ->  UTC {in_fold_fold1.tz_convert(tz='UTC')}")
print(f"Post CET fold:      {post_fold}  ->  UTC {post_fold.tz_convert(tz='UTC')}")

the output is not as expected:

fold0: 0
fold1: 1
Pre CET fold:       2022-10-30 01:30:00 02:00  ->  UTC 2022-10-29 23:30:00 00:00
In CET fold, fold0: 2022-10-30 02:30:00 01:00  ->  UTC 2022-10-30 01:30:00 00:00
In CET fold, fold1: 2022-10-30 02:30:00 01:00  ->  UTC 2022-10-30 01:30:00 00:00
Post CET fold:      2022-10-30 03:30:00 01:00  ->  UTC 2022-10-30 02:30:00 00:00

Line 4 should be:

In CET fold, fold0: 2022-10-30 02:30:00 02:00  ->  UTC 2022-10-30 00:30:00 00:00

What am I missing here?

PS: Using python's datetime objects results in expected output:

from datetime import datetime
from dateutil import tz

dt_pre_fold = datetime(2022,10,30,1,30,0, tzinfo=tz.gettz("CET"))
dt_in_fold_fold0 = datetime(2022,10,30,2,30,0, tzinfo=tz.gettz("CET"))
dt_in_fold_fold1 = datetime(2022,10,30,2,30,0, tzinfo=tz.gettz("CET"), fold=1)
dt_post_fold = datetime(2022,10,30,3,30,0, tzinfo=tz.gettz("CET"))

print(f"Pre CET fold:       {dt_pre_fold}  ->  UTC {dt_pre_fold.astimezone(tz.gettz('UTC'))}")
print(f"In CET fold, fold0: {dt_in_fold_fold0}  ->  UTC {dt_in_fold_fold0.astimezone(tz.gettz('UTC'))}")
print(f"In CET fold, fold1: {dt_in_fold_fold1}  ->  UTC {dt_in_fold_fold1.astimezone(tz.gettz('UTC'))}")
print(f"Post CET fold:      {dt_post_fold}  ->  UTC {dt_post_fold.astimezone(tz.gettz('UTC'))}")

Output:

Pre CET fold:       2022-10-30 01:30:00 02:00  ->  UTC 2022-10-29 23:30:00 00:00
In CET fold, fold0: 2022-10-30 02:30:00 02:00  ->  UTC 2022-10-30 00:30:00 00:00
In CET fold, fold1: 2022-10-30 02:30:00 01:00  ->  UTC 2022-10-30 01:30:00 00:00
Post CET fold:      2022-10-30 03:30:00 01:00  ->  UTC 2022-10-30 02:30:00 00:00

CodePudding user response:

It appears that the timezone info is not correctly specified:

# using your code
x = pd.Timestamp(datetime(2022,10,30,2,30,0), fold = 0, tz="CET")
x.tz_convert('UTC')
# Timestamp('2022-10-30 01:30:00 0000', tz='UTC')

But if you use from dateutil import tz

x = pd.Timestamp(datetime(2022,10,30,2,30,0), fold = 0, tz=tz.gettz("CET"))
x.tz_convert('UTC')
# Timestamp('2022-10-30 00:30:00 0000', tz='UTC')

It returns the correct value

CodePudding user response:

this kind of circumvents the question but I'm not sure why you'd want to use 'fold' in the first place. You can localize a timestamp to a certain time zone and use the ambiguous keyword to specify if it should be the DST or the non-DST time, from the docs:

ambiguous [...] bool-ndarray where True signifies a DST time, False signifies a non-DST time (note that this flag is only applicable for ambiguous times)

So you could have done what you need like

import pandas as pd

f0 = pd.Timestamp("2022-10-30 02:30:00").tz_localize("Europe/Berlin", ambiguous=True)
f1 = pd.Timestamp("2022-10-30 02:30:00").tz_localize("Europe/Berlin", ambiguous=False)

print(f0.tz_convert('UTC'))
print(f1.tz_convert('UTC'))
# 2022-10-30 00:30:00 00:00 # was DST, UTC 2
# 2022-10-30 01:30:00 00:00 # was non-DST, UTC 1

side notes:

  • it is better to use actual IANA time zone names to avoid any ambiguities the abbreviations might have
  • don't mix native Python datetime and pandas' datetime, to avoid some of the rough edges of native Python datetime
  • Related