I cannot figure out how to assign a timezone to a dataframe. I have a df called 'df' that looks like this:
df
Out[67]:
date daily_flow
0 2002-02-13 144000.0
1 2002-02-14 184000.0
2 2002-02-15 159000.0
3 2002-02-16 126000.0
4 2002-02-17 114000.0
... ...
7277 2022-02-02 152000.0
7278 2022-02-03 159000.0
7279 2022-02-04 150000.0
7280 2022-02-05 165000.0
7281 2022-02-06 148000.0
[7282 rows x 2 columns]
df.dtypes
Out[68]:
date datetime64[ns]
daily_flow float64
dtype: object
I have read the documentation and other posts and it is not clear how to assign the "date" column to a timezone such as 'US/Pacific'. thank you!
Here is an example of an error I keep getting when I try and assign a timezone (UTC) to the datetime column in the index position ('date').
df.date.tz_localize('UTC')
Traceback (most recent call last):
Input In [10] in <module>
df.tz_localize('UTC')
File ~\Anaconda3\envs\ARIMA\lib\site-packages\pandas\core\generic.py:9977 in tz_localize
ax = _tz_localize(ax, tz, ambiguous, nonexistent)
File ~\Anaconda3\envs\ARIMA\lib\site-packages\pandas\core\generic.py:9959 in _tz_localize
raise TypeError(
TypeError: index is not a valid DatetimeIndex or PeriodIndex
CodePudding user response:
To set a time zone for a column, parse to_datetime and use the dt accessor of the Series to tz_localize. EX:
df
Out[3]:
date daily_flow
0 2002-02-13 144000.0
1 2002-02-14 184000.0
2 2002-02-15 159000.0
df['date'] = pd.to_datetime(df['date']).dt.tz_localize('UTC')
df['date']
Out[5]:
0 2002-02-13 00:00:00 00:00
1 2002-02-14 00:00:00 00:00
2 2002-02-15 00:00:00 00:00
Name: date, dtype: datetime64[ns, UTC]
In the example, you can replace 'UTC' with the appropriate time zone. You can also convert to another time zone using the same approach (.dt.tz_convert('your-time-zone')
).