Home > Mobile >  Converting multipe columns in a dataframe to date generating error (Boolean array expected for the c
Converting multipe columns in a dataframe to date generating error (Boolean array expected for the c

Time:08-22

I have a dataframe with around 10.000 rows and 130 columns, from which 5 of them are date columns. I would like to convert all 5 date columns into date, meaning have all dates in "yyyy-mm-dd" format and, if something cannot be converted, to leave it as NaT.

When I try to apply pd.to_datetime with "errors = coerce" I receive the following error:

ValueError: Boolean array expected for the condition, not object

Those columns can have anything: blanks, numbers, strings, special characters, etc. I just would like a way to identify those and convert into NaT.

Thank you for your help!

CodePudding user response:

Maybe this helps in your case:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        'col1': [np.nan, '2022-08-21', 'not a date', True, '', 999], 
        'col2': [-1, pd.NA, '2000-12-31', '2345-67-89', False, pd.Timestamp.today()]
    }
)

date_cols = ['col1', 'col2']

for col in date_cols:
    df[col] = pd.to_datetime([str(x) for x in df[col]], errors='coerce')

print(df)

print(df.info())

Output 1:

        col1                       col2
0        NaT                        NaT
1 2022-08-21                        NaT
2        NaT 2000-12-31 00:00:00.000000
3        NaT                        NaT
4        NaT                        NaT
5        NaT 2022-08-21 16:38:16.790773

Output 2:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   col1    1 non-null      datetime64[ns]
 1   col2    1 non-null      datetime64[ns]
dtypes: datetime64[ns](2)
memory usage: 224.0 bytes
None
  • Related