date_cet col1
---------------------------------------
2021-10-31 02:00:00 02:00 7.0
2021-10-31 02:00:00 02:00 7.0
2021-10-31 02:00:00 02:00 8.0
2021-10-31 02:00:00 01:00 10.0
2021-10-31 02:00:00 01:00 11.0
I have a data frame that has columns looking similar to this. This data is imported from SQL into a Pandas data frame, and when I print out the dtypes
I can see that the date_cet
column is object
. Since I need it further on, I want to convert it to a datetime object. However, the stuff I've tried just doesn't work, and I think it might have something to do with 1) the timezone difference and 2) the fact that this date is where DST changes (i.e. the 01:00
and 02:00
).
I've tried to do stuff like this:
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S %z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'])
and a hand full of other stuff.
The first gives an error of:
ValueError: time data '2021-10-31 02:00:00 02:00' does not match format '%Y-%m-%d %H:%M:%S %z'
And the last:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
Basically, I have no idea how to fix this. I just need this column to become a datetime[ns, Europe/Copenhagen] type of column, but everything I've done so far doesn't work.
CodePudding user response:
In the datetime string ('2021-10-31 02:00:00 02:00') there is no space between %S
and %z
try to change to this format - "%Y-%m-%d %H:%M:%S%z"
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S%z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'])
Update:
to fix the error try adding - utc=True
:
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S%z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'], utc=True)
you can do this by one line:
df['new_date']= pd.to_datetime(df['date_cet'], format="%Y-%m-%d %H:%M:%S%z", utc=True)