I have a dataframe I am trying to clean. its not dates formatted in a string so Im converting it to datetime using:
df['datecreated']= pd.to_datetime(df['datecreated'], format='%Y-%m-%d %H:%M:%S.%f')
getting an error:
ValueError: time data "2021-09-16 16:32:11.643333" doesn't match format specified
however when I test it separately, it runs:
Why am I getting the ValueError ?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2084 try:
-> 2085 values, tz_parsed = conversion.datetime_to_datetime64(data)
2086 # If tzaware, these values represent unix timestamps, so we
pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-59-63141dd04215> in <module>
----> 1 df['datecreated']= pd.to_datetime(df['datecreated'], format='%Y-%m-%d %H:%M:%S.%f')
2 df
~\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
803 result = arg.map(cache_array)
804 else:
--> 805 values = convert_listlike(arg._values, format)
806 result = arg._constructor(values, index=arg.index, name=arg.name)
807 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
~\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
470 errors=errors,
471 require_iso8601=require_iso8601,
--> 472 allow_object=True,
473 )
474
~\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2088 return values.view("i8"), tz_parsed
2089 except (ValueError, TypeError):
-> 2090 raise e
2091
2092 if tz_parsed is not None:
~\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2079 dayfirst=dayfirst,
2080 yearfirst=yearfirst,
-> 2081 require_iso8601=require_iso8601,
2082 )
2083 except ValueError as e:
pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
ValueError: time data "2021-09-16 16:32:11.643333" doesn't match format specified
Update
I tried adding: errors='coerce'
but the datecreated column is now NaT. that doesnt help me
CodePudding user response:
as the error suggests:
ValueError: time data "2021-09-16 16:32:11.643333" doesn't match format specified
the format needed the quotation marks as well.
df['datecreated']= pd.to_datetime(df['datecreated'], format='"%Y-%m-%d %H:%M:%S.%f"')
CodePudding user response:
If your date strings have extra double quotes at both ends, causing the format mismatch error, you are better to remove the double quotes before invoking pd.to_datetime
. In this way, if you have other elements without the double quotes, the codes still work for all entries with and without double quotes:
You can use .str.strip()
to remove the double quotes at both ends:
df['datecreated'] = pd.to_datetime(df['datecreated'].str.strip('"'), format='%Y-%m-%d %H:%M:%S.%f')