How to parse correctly to date in python from specific ISO format-CodePudding

I connected to a table on a db where there are two columns with dates. I had no problem to parse the column with values formatted as this: 2017-11-03
But I don't find a way to parse the other column with dates formatted as this: 2017-10-03 05:06:52.840 02:00

My attempts
If I parse a single value through the strptime method

dt.datetime.strptime("2017-12-14 22:16:24.037  02:00", "%Y-%m-%d %H:%M:%S.%f %z")

I get the correct output
datetime.datetime(2017, 12, 14, 22, 16, 24, 37000, tzinfo=datetime.timezone(datetime.timedelta(seconds=7200)))

but if I try to use the same code format while parsing the table to the dataframe, the column dtype is an object:

Licenze_FromGY = pd.read_sql(query, cnxn, parse_dates={"EndDate":"%Y-%m-%d", "LastUpd":"%Y-%m-%d %H:%M:%S.%f %z"})


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Tenant     1000 non-null   int64         
 1   IdService  1000 non-null   object        
 2   Code       1000 non-null   object        
 3   Aggregate  1000 non-null   object        
 4   Bundle     991 non-null    object        
 5   Status     1000 non-null   object        
 6   Value      1000 non-null   int64         
 7   EndDate    258 non-null    datetime64[ns]
 8   Trial      1000 non-null   bool          
 9   LastUpd    1000 non-null   object

I also tried to change the code format either in the read_sql method or in the pd.to_datetime() method, but then all the values become NaT:

Licenze_FromGY["LastUpd"] = pd.to_datetime(Licenze_FromGY["LastUpd"], format="%Y-%m-%d %H:%M:%S.%fZ", errors="coerce") 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Tenant     1000 non-null   int64         
 1   IdService  1000 non-null   object        
 2   Code       1000 non-null   object        
 3   Aggregate  1000 non-null   object        
 4   Bundle     991 non-null    object        
 5   Status     1000 non-null   object        
 6   Value      1000 non-null   int64         
 7   EndDate    258 non-null    datetime64[ns]
 8   Trial      1000 non-null   bool          
 9   LastUpd    0 non-null      datetime64[ns]
dtypes: bool(1), datetime64[ns](2), int64(2), object(5)
memory usage: 71.4  KB
None

Anyone can help?

CodePudding user response：

pandas cannot handle mixed UTC offsets in one Series (column). Assuming you have data like this

import pandas as pd

df = pd.DataFrame({"datetime": ["2017-12-14 22:16:24.037  02:00", 
                                "2018-08-14 22:16:24.037  03:00"]})

if you just parse to datetime,

df["datetime"] = pd.to_datetime(df["datetime"])

df["datetime"]
0    2017-12-14 22:16:24.037000 02:00
1    2018-08-14 22:16:24.037000 03:00
Name: datetime, dtype: object

you get dtype object. The elements of the series are of the Python datetime.datetime dtype. That limits the datetime functionality, compared to the pandas datetime dtype.

You can get that e.g. by parsing to UTC:

df["datetime"] = pd.to_datetime(df["datetime"], utc=True)

df["datetime"]
0   2017-12-14 20:16:24.037000 00:00
1   2018-08-14 19:16:24.037000 00:00
Name: datetime, dtype: datetime64[ns, UTC]

You might set an appropriate time zone to re-create the UTC offset:

df["datetime"] = pd.to_datetime(df["datetime"], utc=True).dt.tz_convert("Europe/Athens")

df["datetime"]
0   2017-12-14 22:16:24.037000 02:00
1   2018-08-14 22:16:24.037000 03:00
Name: datetime, dtype: datetime64[ns, Europe/Athens]