Home > database >  Converting Str Column to Int Not Working in Pandas
Converting Str Column to Int Not Working in Pandas

Time:06-06

I have the following code to load data

import pandas as pd
data = pd.read_csv("Salary-Data.csv")
data["Income"] = data["Income"].str.strip()
#data["Income"] = data["Income"].apply(pd.to_numeric, errors='coerce')
#data["Income"] = data["Income"].astype(int)
data

This produces the following error:

~/miniconda3/envs/scientific-base/lib/python3.8/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: '16\xa0638'

The first value in the Income column is 16 638 (with a space).

If I comment out the erroring line and inspect the dataframe, the values in Income column still contain spaces.

What is going on? How can I make this column into one of valid integers or floats?

CodePudding user response:

Change strip to replace

data["Income"] = data["Income"].str.replace(' ','')

CodePudding user response:

Here is another way to do it, i.e., to replace out all non-digit characters

df['income'].replace(r'\D','',regex=True)

to keep the decimal as part of the number

df['income'].replace(r'[^0-9,\.]','',regex=True)
  • Related