I am trying to set timezone to a datetime column, based on another column containing the time zone.
Example data:
DATETIME VALUE TIME_ZONE
0 2021-05-01 00:00:00 1.00 Europe/Athens
1 2021-05-01 00:00:00 2.13 Europe/London
2 2021-05-01 00:00:00 5.13 Europe/London
3 2021-05-01 01:00:00 4.25 Europe/Dublin
4 2021-05-01 01:00:00 4.25 Europe/Paris
I am trying to assign a time zone to the DATETIME
column, but using the tz_localize
method, I cannot avoid using an apply call, which will be very slow on my large dataset. Is there some way to do this without using apply?
What I have now (which is slow):
df['DATETIME_WITH_TZ'] = df.apply(lambda row: row['DATETIME'].tz_localize(row['TIME_ZONE']), axis=1)
CodePudding user response:
I'm not sure but a listcomp seems to be x17 faster than apply
in your case :
df["DATETIME_WITH_TZ"] = [dt.tz_localize(tz)
for dt,tz in zip(df["DATETIME"], df["TIME_ZONE"])]
Another variant, with tz_convert
:
df["DATETIME_WITH_TZ"] = [dt.tz_localize("UTC").tz_convert(tz)
for dt,tz in zip(df["DATETIME"], df["TIME_ZONE"])]
Timing :
#%%timeit #listcomp1
47.4 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
#%%timeit #listcomp2
25.7 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
#%%timeit #apply
457 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Output :
print(df)
DATETIME VALUE TIME_ZONE DATETIME_WITH_TZ
0 2021-05-01 00:00:00 1.00 Europe/Athens 2021-05-01 03:00:00 03:00
1 2021-05-01 00:00:00 2.13 Europe/London 2021-05-01 01:00:00 01:00
2 2021-05-01 00:00:00 5.13 Europe/London 2021-05-01 01:00:00 01:00
3 2021-05-01 01:00:00 4.25 Europe/Dublin 2021-05-01 02:00:00 01:00
4 2021-05-01 01:00:00 4.25 Europe/Paris 2021-05-01 03:00:00 02:00