As here it is referencing to input format
same as python
strftime
.
Using python
strptime
import datetime
date = '19981128'
date_python = datetime.datetime.strptime(date, '%Y/%m/%d')
output:
ValueError: time data '19981128' does not match format '%Y/%m/%d'
Using same with pandas
import pandas as pd
import datetime
date = '19981128'
date_pandas = pd.to_datetime(date, errors='coerce', format='%Y/%m/%d')
print(date_pandas)
output
1998-11-28 00:00:00
pandas
Documention clearly says datetime
module in pandas
replicates behaviour of python
strptime. Which is not happening here?
CodePudding user response:
pandas
has a "special fast-path for iso8601 formatted datetime strings". ISO 8601 specifies YMD order for a date.
From pandas\core\tools\datetimes.py:
if format is not None:
# There is a special fast-path for iso8601 formatted
# datetime strings, so in those cases don't use the inferred
# format because this path makes process slower in this
# special case
format_is_iso8601 = format_is_iso(format)
if format_is_iso8601:
require_iso8601 = not infer_datetime_format
format = None
When the provided format passes the format_is_iso()
test the fast-path code is used and (through experimentation) allows slash, hyphen or nothing between the YMD. You can also see below that if YMD isn't used the code will fail:
import pandas as pd
print(repr(pd.to_datetime('19981106',format='%Y/%m/%d')))
print(repr(pd.to_datetime('1998/11/06',format='%Y/%m/%d')))
print(repr(pd.to_datetime('1998-11-06',format='%Y/%m/%d')))
print(repr(pd.to_datetime('11061998',format='%m/%d/%Y'))) # not using fast-path, more strict
Output:
Timestamp('1998-11-06 00:00:00')
Timestamp('1998-11-06 00:00:00')
Timestamp('1998-11-06 00:00:00')
Traceback (most recent call last):
...
ValueError: time data '11061998' does not match format '%m/%d/%Y' (match)
I don't see a way to force an exact match of the format, unfortunately. There is an exact=True
argument but that is already the default.
You could always force strptime
:
>>> import pandas as pd
>>> df = pd.DataFrame(['19981128','20220222'])
>>> df
0
0 19981128
1 20220222
>>> from datetime import datetime
>>> df[0].apply(lambda x: datetime.strptime(x, '%Y/%m/%d')) # now it fails
Traceback (most recent call last):
...
ValueError: time data '19981128' does not match format '%Y/%m/%d'
>>> df[0].apply(lambda x: datetime.strptime(x, '%Y%m%d'))
0 1998-11-28
1 2022-02-22
Name: 0, dtype: datetime64[ns]
CodePudding user response:
The documentation says:
Returns
datetime
If parsing succeeded. Return type depends on input (types in parenthesis correspond to fallback in case of unsuccessful timezone or out-of-range timestamp parsing):
scalar: Timestamp (or datetime.datetime)
(emphasis mine)
and
Notes
Many input types are supported, and lead to different output types:
scalars can be int, float, str, datetime object (from stdlib datetime module or numpy). They are converted to Timestamp when possible, otherwise they are converted to datetime.datetime.
(emphasis mine)
and
The following causes are responsible for datetime.datetime objects being returned [...]
None of the reasons listed there apply to your case.
Your date can be converted into a timestamp, so datetime is not used.
print(pd.Timestamp(date))
Output:
1998-11-28 00:00:00
You can parse with strptime
first and pass the result to Pandas, if you like:
import pandas as pd
import datetime
date = '19981128'
try:
date_python = datetime.datetime.strptime(date, '%Y/%m/%d')
date_pandas = pd.to_datetime(date_python, errors='coerce')
except ValueError:
date_pandas = pd.NaT
print(date_pandas)