df["Dt_Customer"] = pd.to_datetime(df["Dt_Customer"],format='%d-%m-%y')
I have tried to convert the date column, the dataset contains more than 1 million rows... I have to find the date rows which are not getting converted.
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-124-d701d963ff8c> in <module>
----> 1 df["Dt_Customer"] = pd.to_datetime(df["Dt_Customer"],format='%d-%m-%y')
c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py
in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format,
origin, cache)
803 result = arg.map(cache_array)
804 else:
--> 805 values = convert_listlike(arg._values, format)
806 result = arg._constructor(values, index=arg.index, name=arg.name)
807 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py
in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst,
yearfirst, exact)
458 return DatetimeIndex._simple_new(dta, name=name)
459 except (ValueError, TypeError):
--> 460 raise e
461
462 if result is None:
c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py
in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst,
yearfirst, exact)
421 if result is None:
422 try:
--> 423 result, timezones = array_strptime(
424 arg, format, exact=exact, errors=errors
425 )
pandas\_libs\tslibs\strptime.pyx in pandas._libs.tslibs.strptime.array_strptime()
ValueError: unconverted data remains: 12
CodePudding user response:
an efficient solution would be to parse the date strings to datetime, with keyword errors
set to 'coerce'
. That will give NaT
(not-a-time) for invalid strings. You can derive a boolean mask from that by calling .isnull()
which you can then use to extract respective values.
Ex:
import pandas as pd
df = pd.DataFrame({"Dt_Customer": ["28-12-2020", "not a date"]})
invalid = df.loc[pd.to_datetime(df["Dt_Customer"],
format='%d-%m-%Y',
errors='coerce').isnull(), "Dt_Customer"]
print(invalid)
1 not a date
Name: Dt_Customer, dtype: object
Note that you can also omit the format
keyword to make the check unspecific, i.e. accepting any date/time format the parser can parse.
CodePudding user response:
you could try a loop with try and except:
causing_error_list = []
for x in df["Dt_Customer"].values:
try:
pd.to_datetime(x,format='%d-%m-%y')
except:
causing_error_list.append(x)