Python Q. How to parse an object index in a data frame into its date, time, and time zone when it has multiple time zones?
The format is "YYY-MM-DD HH:MM:SS-HH:MM" where the right "HH:MM" is the timezone.
Example: Midnight Jan 1st, 2020 in Mountain Time, counting up:
2020-01-01 00:00:00-07:00
2020-01-01 01:00:00-07:00
2020-01-01 02:00:00-07:00
2020-01-01 04:00:00-06:00
I've got code that works for one time zone, but it breaks when a second timezone is introduced.
df['Date'] = pd.to_datetime(df.index)
df['year']= df['Date'].dt.year
df['month']= df['Date'].dt.month
df['month_n']= df['Date'].dt.month_name()
df['day']= df['Date'].dt.day
df['day_n']= df['Date'].dt.day_name()
df['h']= df['Date'].dt.hour
df['mn']= df['Date'].dt.minute
df['s']= df['Date'].dt.second
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc="True"
CodePudding user response:
Use pandas.DataFrame.apply
instead :
df['Date'] = pd.to_datetime(df.index)
df_info = df['Date'].apply(lambda t: pd.Series({
'date': t.date(),
'year': t.year,
'month': t.month,
'month_n': t.strftime("%B"),
'day': t.day,
'day_n': t.strftime("%A"),
'h': t.hour,
'mn': t.minute,
's': t.second,
}))
df = pd.concat([df, df_info], axis=1)
# Output :
print(df)
Date date year month month_n day day_n h mn s
col
2020-01-01 00:00:00-07:00 2020-01-01 00:00:00-07:00 2020-01-01 2020 1 January 1 Wednesday 0 0 0
2020-01-01 01:00:00-07:00 2020-01-01 01:00:00-07:00 2020-01-01 2020 1 January 1 Wednesday 1 0 0
2020-01-01 02:00:00-07:00 2020-01-01 02:00:00-07:00 2020-01-01 2020 1 January 1 Wednesday 2 0 0
2020-01-01 04:00:00-06:00 2020-01-01 04:00:00-06:00 2020-01-01 2020 1 January 1 Wednesday 4 0 0
CodePudding user response:
@abokey 's answer is great if you aren't sure of the actual time zone or cannot work with UTC. However, you don't have the dt
accessor and lose the performance of a "vectorized" approach.
So if you can use UTC or set a time zone (you just have UTC offset at the moment !), e.g. "America/Denver", all will work as expected:
import pandas as pd
df = pd.DataFrame({'v': [999,999,999,999]},
index = ["2020-01-01 00:00:00-07:00",
"2020-01-01 01:00:00-07:00",
"2020-01-01 02:00:00-07:00",
"2020-01-01 04:00:00-06:00"])
df['Date'] = pd.to_datetime(df.index, utc=True)
print(df.Date.dt.hour)
# 2020-01-01 00:00:00-07:00 7
# 2020-01-01 01:00:00-07:00 8
# 2020-01-01 02:00:00-07:00 9
# 2020-01-01 04:00:00-06:00 10
# Name: Date, dtype: int64
# Note: hour changed since we converted to UTC !
or
df['Date'] = pd.to_datetime(df.index, utc=True).tz_convert("America/Denver")
print(df.Date.dt.hour)
# 2020-01-01 00:00:00-07:00 0
# 2020-01-01 01:00:00-07:00 1
# 2020-01-01 02:00:00-07:00 2
# 2020-01-01 04:00:00-06:00 3
# Name: Date, dtype: int64