I have a data frame wherein a column is of "object" data type. I use pd.to_numeric()
with errors = 'coerce'
to convert this to "float" data type. However, the converted column appears as NaN for all entries. If I let errors = 'ignore'
, none of the entries are converted to float. Is there something I am missing? The following is the code snippet:
pd.to_numeric(df['gender'],errors = 'coerce')
The column df['gender']
comprises 'Male' and 'Female' entries. I would like to convert these to 'float' data type.
Thank you!
CodePudding user response:
to_numeric
can only convert numeric-ish things. For example it can convert the string '10'
into the number 10
, but it can't convert something like 'Male'
into a number.
Instead use pd.factorize
:
df['gender'] = pd.factorize(df['gender'])[0].astype(float)
Or Series.factorize
:
df['gender'] = df['gender'].factorize()[0].astype(float)
The first element of factorize
contains the integer codes, so then we convert them astype(float
).
Or as you commented, Series.map
also works:
df['gender'] = df['gender'].map({'Male': 0, 'Female': 1}).astype(float)