In dataframe I have column "_date_
" . This column contains lots of strings, some of which are dates. I'm trying to convert the strings that contain dates into datetimes.
I have used pd.to_datetime with errors='coerce', however part of string cells are also converted to date, for example "--70--" is converted to "2070-01-01"
How can I prevent pandas to not convert not date strings to date.
df = pd.DataFrame(
{'_date_': ['22 Jun 2023', '123', '--99--', "--70--", "b", "c", "d"]})
df["_date_"] = pd.to_datetime(df['_date_'], errors='coerce')
df['_date_']
0 2023-06-22
1 NaT
2 1999-01-01
3 2070-01-01
4 NaT
5 NaT
6 NaT
Name: _date_, dtype: datetime64[ns]
CodePudding user response:
If you have a specific date format, then be specific:
pd.to_datetime(df['_date_'], format='%d %b %Y', errors='coerce')
Else, you can replace a specific unwanted character (here -
on start of string, but not in the middle to enable correct parsing of something like 2022-08-29
) to ensure the date will be invalid:
pd.to_datetime(df['_date_'].str.replace('^-', 'invalid', regex=True),
errors='coerce')
output:
0 2023-06-22
1 NaT
2 NaT
3 NaT
4 NaT
5 NaT
6 NaT
Name: _date_, dtype: datetime64[ns]