Home > database >  What is the best way to convert the datatype of columns of a Pandas dataframe using a dict with data
What is the best way to convert the datatype of columns of a Pandas dataframe using a dict with data

Time:10-01

What is the best way to convert the datatype of columns of a Pandas dataframe using a dict with data types?

e.g. I have a dataframe df:

d = {'col1': ["1", "abc"], 'col2': ["abc", "02-02-2021"]}
df = pd.DataFrame(data=d)

and I have a dict:

dtype_dict = { "col1": int
               "col2": datetime}

When a value in a column can not be converted tot the correct datatype I need it to set to NaN (similar behaviour as in the errors = 'coerce' parameter in pd.to_numeric)

Expected output:

d_out = {'col1': [1, NaN], 'col2': [NaN, 02-02-2021]}
df_out = pd.DataFrame(data=d)

My true dataset contains of multiple large pandas dataframes and corresponding dicts. So I am looking for an automated way to convert complete dataframes.

Thanks!

CodePudding user response:

If you slightly modify your dict, you can use:

from functools import partial

dtype_dict = { "col1": partial(pd.to_numeric, errors='coerce'),
               "col2": partial(pd.to_datetime, errors='coerce')}

out = df.agg(dtype_dict)
>>> out
   col1       col2
0   1.0        NaT
1   NaN 2021-02-02
  • Related