I have a dataframe with around 10.000 rows and 130 columns, from which 5 of them are date columns. I would like to convert all 5 date columns into date, meaning have all dates in "yyyy-mm-dd" format and, if something cannot be converted, to leave it as NaT.
When I try to apply pd.to_datetime with "errors = coerce" I receive the following error:
ValueError: Boolean array expected for the condition, not object
Those columns can have anything: blanks, numbers, strings, special characters, etc. I just would like a way to identify those and convert into NaT.
Thank you for your help!
CodePudding user response:
Maybe this helps in your case:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'col1': [np.nan, '2022-08-21', 'not a date', True, '', 999],
'col2': [-1, pd.NA, '2000-12-31', '2345-67-89', False, pd.Timestamp.today()]
}
)
date_cols = ['col1', 'col2']
for col in date_cols:
df[col] = pd.to_datetime([str(x) for x in df[col]], errors='coerce')
print(df)
print(df.info())
Output 1:
col1 col2
0 NaT NaT
1 2022-08-21 NaT
2 NaT 2000-12-31 00:00:00.000000
3 NaT NaT
4 NaT NaT
5 NaT 2022-08-21 16:38:16.790773
Output 2:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 1 non-null datetime64[ns]
1 col2 1 non-null datetime64[ns]
dtypes: datetime64[ns](2)
memory usage: 224.0 bytes
None