Home > Back-end >  Python (Pandas) equivalent of R:as.numeric()
Python (Pandas) equivalent of R:as.numeric()

Time:09-22

I'm still getting to grips with working in pandas.

What I'd like to do is convert a column's type (from string to integer). The column is encoded as string data but includes mostly integer values. I'd like the whole column to be of type integer. On the few occasions where conversion is not possible, I'd like it to just be NA / nan.

I'm migrating from R, where this behaviour is somewhat expected:

df <- data.frame(
  "id" = c(1,2,3),
  "age" = c("12", "not_an_age", "34 and a half")
)

converted_df <- dplyr::mutate(df, age = as.numeric(age))

converted_df
### output
# id age
# 1  12
# 2  NA
# 3  NA

In Python

df = pd.DataFrame({'id':[1,2,3], 'age':['12', 'not_an_age', '34 and a half']})

# not run
# as type only allows errors to be raised or ignored not coerced
df['age'].astype('int')

How can I create the result I expect from R, in pandas? It feels like there is a function/argument to a function I'm forgetting about.

Thanks

CodePudding user response:

To deal with mixed integer and NaN use a IntXXDType:

>>> pd.to_numeric(df.age, errors='coerce').astype(pd.Int16Dtype())
0      12
1    <NA>
2    <NA>
Name: age, dtype: Int16

If you use int, it will raise an exception:

>>> pd.to_numeric(df.age, errors='coerce').astype(int)
...
IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer
  • Related