Home > OS >  Python - Parse object index with multiple time zones
Python - Parse object index with multiple time zones

Time:09-23

Python Q. How to parse an object index in a data frame into its date, time, and time zone when it has multiple time zones?

The format is "YYY-MM-DD HH:MM:SS-HH:MM" where the right "HH:MM" is the timezone.

Example: Midnight Jan 1st, 2020 in Mountain Time, counting up:

2020-01-01 00:00:00-07:00
2020-01-01 01:00:00-07:00
2020-01-01 02:00:00-07:00
2020-01-01 04:00:00-06:00

I've got code that works for one time zone, but it breaks when a second timezone is introduced.

df['Date'] = pd.to_datetime(df.index)
df['year']= df['Date'].dt.year
df['month']= df['Date'].dt.month
df['month_n']= df['Date'].dt.month_name()
df['day']= df['Date'].dt.day
df['day_n']= df['Date'].dt.day_name()
df['h']= df['Date'].dt.hour
df['mn']= df['Date'].dt.minute
df['s']= df['Date'].dt.second

ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc="True"

CodePudding user response:

Use pandas.DataFrame.apply instead :

df['Date'] = pd.to_datetime(df.index)

df_info = df['Date'].apply(lambda t: pd.Series({
    'date': t.date(),
    'year': t.year,
    'month': t.month,
    'month_n': t.strftime("%B"),
    'day': t.day,
    'day_n': t.strftime("%A"),
    'h': t.hour,
    'mn': t.minute,
    's': t.second,
}))

df = pd.concat([df, df_info], axis=1)
# Output :
print(df)

                                                    Date        date  year  month  month_n  day      day_n  h  mn  s
    col                                                                                                             
    2020-01-01 00:00:00-07:00  2020-01-01 00:00:00-07:00  2020-01-01  2020      1  January    1  Wednesday  0   0  0
    2020-01-01 01:00:00-07:00  2020-01-01 01:00:00-07:00  2020-01-01  2020      1  January    1  Wednesday  1   0  0
    2020-01-01 02:00:00-07:00  2020-01-01 02:00:00-07:00  2020-01-01  2020      1  January    1  Wednesday  2   0  0
    2020-01-01 04:00:00-06:00  2020-01-01 04:00:00-06:00  2020-01-01  2020      1  January    1  Wednesday  4   0  0

CodePudding user response:

@abokey 's answer is great if you aren't sure of the actual time zone or cannot work with UTC. However, you don't have the dt accessor and lose the performance of a "vectorized" approach.

So if you can use UTC or set a time zone (you just have UTC offset at the moment !), e.g. "America/Denver", all will work as expected:

import pandas as pd

df = pd.DataFrame({'v': [999,999,999,999]},
                   index = ["2020-01-01 00:00:00-07:00",
                            "2020-01-01 01:00:00-07:00",
                            "2020-01-01 02:00:00-07:00",
                            "2020-01-01 04:00:00-06:00"])

df['Date'] = pd.to_datetime(df.index, utc=True)

print(df.Date.dt.hour)
# 2020-01-01 00:00:00-07:00     7
# 2020-01-01 01:00:00-07:00     8
# 2020-01-01 02:00:00-07:00     9
# 2020-01-01 04:00:00-06:00    10
# Name: Date, dtype: int64

# Note: hour changed since we converted to UTC !

or

df['Date'] = pd.to_datetime(df.index, utc=True).tz_convert("America/Denver")
print(df.Date.dt.hour)
# 2020-01-01 00:00:00-07:00    0
# 2020-01-01 01:00:00-07:00    1
# 2020-01-01 02:00:00-07:00    2
# 2020-01-01 04:00:00-06:00    3
# Name: Date, dtype: int64
  • Related