Home > Net >  Given a pandas dataframe with a UTC timestamp column and a timezone column, generate a local timesta
Given a pandas dataframe with a UTC timestamp column and a timezone column, generate a local timesta

Time:10-07

Here is my data structure:

ids = 1
timezones = ['America/Los_Angeles', 'Europe/Paris', 'America/New_York']
utc_time = ['2020-09-03 19:36:18.534', '2020-09-03 19:36:18.534', '2020-09-03 19:36:18.534']

df = pd.DataFrame({'id':ids,
                   'timezone':timezones,
                   'utc_time':utc_time})
print(df)

   id             timezone                 utc_time
0   1  America/Los_Angeles  2020-09-03 19:36:18.534
1   1         Europe/Paris  2020-09-03 19:36:18.534
2   1     America/New_York  2020-09-03 19:36:18.534

How can I generate a new column added to this dataframe that converts the UTC time to the local time based on the timezone? I have tried a few solutions I've found but haven't been able to get them to work in this format whereby the timezone changes for each row.

Thanks in advance.

CodePudding user response:

This is an instance where I have to use .apply through the rows to get what you need.

The basic steps are:

  1. convert your datestring to timestamps
  2. localize to UTC
  3. convert row-by-row
import pandas

ids = 1
timezones = ['America/Los_Angeles', 'Europe/Paris', 'America/New_York']
datestrings = ['2020-09-03 19:36:18.534', '2020-09-03 19:36:18.534', '2020-09-03 19:36:18.534']

df = pandas.DataFrame({
    'id': ids,
    'timezone': timezones,
    'datestring': datestrings
}).assign(
    # steps 1 & 2:
    utc_time=lambda df: pandas.to_datetime(df["datestring"]).dt.tz_localize("UTC"),
    # step 3:
    local_time=lambda df: df.apply(lambda r: r["utc_time"].tz_convert(r["timezone"]), axis=1)
)

 id             timezone               datestring                         utc_time                        local_time
  1  America/Los_Angeles  2020-09-03 19:36:18.534 2020-09-03 19:36:18.534000 00:00  2020-09-03 12:36:18.534000-07:00
  1         Europe/Paris  2020-09-03 19:36:18.534 2020-09-03 19:36:18.534000 00:00  2020-09-03 21:36:18.534000 02:00
  1     America/New_York  2020-09-03 19:36:18.534 2020-09-03 19:36:18.534000 00:00  2020-09-03 15:36:18.534000-04:00

CodePudding user response:

from dateutil import tz
from datetime import datetime
import pandas as pd
def convert(row):
    from_zone = tz.gettz('UTC')
    to_zone = tz.gettz(row['timezone'])
    utc = datetime.strptime(row['utc_time'], '%Y-%m-%d %H:%M:%S')
    utc = utc.replace(tzinfo=from_zone)
    target = utc.astimezone(to_zone)
    return target.strftime('%m/%d/%Y %H:%M:%S %Z')
data = {'id': [1, 1, 1], 'timezone': ['America/Los_Angeles', 'Europe/Paris', 'America/New_York'], 'utc_time': ['2020-09-03 19:36:18', '2020-09-03 19:36:18', '2020-09-03 19:36:18']}
df = pd.DataFrame(data=data)
df['local_time'] = df.apply(lambda row: convert(row), axis=1)
print(df)

Result is as follows.

   id             timezone             utc_time           local_time
0   1  America/Los_Angeles  2020-09-03 19:36:18  09/03/2020 12:36:18
1   1         Europe/Paris  2020-09-03 19:36:18  09/03/2020 21:36:18
2   1     America/New_York  2020-09-03 19:36:18  09/03/2020 15:36:18
  • Related