I have a column with dates (Format: 2022-05-15) with the current dtype: object. I want to change the dtype to datetime with the following code:
df['column'] = pd.to_datetime(df['column'])
I receive the error:
ParserError: Unknown string format: DU2999
Im changing multible columns (e.g. another date column with format dd-mm-yyyy hh-mm-ss). I get the error only for the mentioned column. Thank you very much for your help in advance.
CodePudding user response:
As @Naveed said there invalid
date strings in date column
such as DU2999
. What you can do is simply find find out which strings that are not in date format.
temp_date = pd.to_datetime(df['Date_column'], errors='coerce', dayfirst=True)
mask = temp_date.isna()
out = df[mask]
#Problmeatic columns ==Filter columns with True values
df_problematic = out[ out.any(axis=1)]
print(df_problematic)
CodePudding user response:
If you want to handle this error by setting the resulting datetime value to NaT whenever the input value is "DU2999" (or another string that does not match the expected format), you can use:
df['column'] = pd.to_datetime(df['column'], errors='coerce')
. See https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html.
If you want to manually correct this specific case, you could use print(df.loc[df['column']=="DU2999"])
to view that row of the dataframe and decide what to overwrite it with.