I have a dataset containing 2 timestamp columns with string format YYYY-mm-dd HH:mm:ss
. The timezone of these timestamps is not included in the string, but they are in UTC and I want to convert them to US/Pacific.
My dataset is in a csv file, so I'm parsing those 2 timestamp columns using
df = pd.read_csv("my_dataset.csv", parse_dates=['timestamp_col1', 'timestamp_col2'])
After reading some posts here it seems the below (example for one column only) could be a possible solution
pd.to_datetime(df.timestamp_col1, utc=True).astimezone(timezone('US/Pacific'))
However, I'm getting the error below. Any insights into how to fix this or is there a better way to do this?
'Series' object has no attribute 'astimezone'
CodePudding user response:
It seems you are looking for dt.tz_convert
:
pd.to_datetime(df.timestamp_col, utc=True)
#0 2022-10-10 10:10:00 00:00
#Name: timestamp_col, dtype: datetime64[ns, UTC]
pd.to_datetime(df.timestamp_col, utc=True).dt.tz_convert(tz="US/Pacific")
#0 2022-10-10 03:10:00-07:00
#Name: timestamp_col, dtype: datetime64[ns, US/Pacific]
Data:
df = pd.read_csv(StringIO("""timestamp_col
2022-10-10 10:10:00
"""), parse_dates=['timestamp_col'])
df
timestamp_col
0 2022-10-10 10:10:00
CodePudding user response:
The pandas.Timestamp
object has its native timezone feature. Localize it with the tz
argument and convert it simply with the tz_convert()
method. So no need to mingle packages
pd.Timestamp.now(tz='UTC').tz_convert('US/Pacific')
This post covered it also.
You can also convert all timestamps of a series by using apply
sr = pd.Series([pd.Timestamp.now(tz='UTC'), pd.Timestamp.now(tz='UTC')])
sr.apply(lambda x: x.tz_convert('US/Pacific'))
One needs to use this workaround since pandas.Series.tz_conver()
expects a pandas.DatetimeIndex
so that it converts the index and not the values.