What is the best way to convert the datatype of columns of a Pandas dataframe using a dict with data types?
e.g. I have a dataframe df:
d = {'col1': ["1", "abc"], 'col2': ["abc", "02-02-2021"]}
df = pd.DataFrame(data=d)
and I have a dict:
dtype_dict = { "col1": int
"col2": datetime}
When a value in a column can not be converted tot the correct datatype I need it to set to NaN (similar behaviour as in the errors = 'coerce' parameter in pd.to_numeric)
Expected output:
d_out = {'col1': [1, NaN], 'col2': [NaN, 02-02-2021]}
df_out = pd.DataFrame(data=d)
My true dataset contains of multiple large pandas dataframes and corresponding dicts. So I am looking for an automated way to convert complete dataframes.
Thanks!
CodePudding user response:
If you slightly modify your dict, you can use:
from functools import partial
dtype_dict = { "col1": partial(pd.to_numeric, errors='coerce'),
"col2": partial(pd.to_datetime, errors='coerce')}
out = df.agg(dtype_dict)
>>> out
col1 col2
0 1.0 NaT
1 NaN 2021-02-02