Home > Blockchain >  How to convert String data to int data for preparing linear regression?
How to convert String data to int data for preparing linear regression?

Time:12-23

I am trying to prepare my data to regression. So I am trying to convert String column to integer with this code :

train["comment"] = train["comment"].astype(int)

But I am getting this error :

runfile('C:/Users/hayyi/.spyder-py3/temp.py', wdir='C:/Users/hayyi/.spyder-py3') Traceback (most recent call last):

File "C:\Users\hayyi.spyder-py3\temp.py", line 57, in train["comment"] = train["comment"].astype(int)

File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\generic.py", line 5815, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)

File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\internals\managers.py", line 418, in astype return self.apply("astype", dtype=dtype, copy=copy, errors=errors)

File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply applied = getattr(b, f)(**kwargs)

File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\internals\blocks.py", line 591, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)

File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\dtypes\cast.py", line 1309, in astype_array_safe new_values = astype_array(values, dtype, copy=copy)

File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\dtypes\cast.py", line 1257, in astype_array values = astype_nansafe(values, dtype, copy=copy)

File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\dtypes\cast.py", line 1174, in astype_nansafe return lib.astype_intsafe(arr, dtype)

File "pandas_libs\lib.pyx", line 679, in pandas._libs.lib.astype_intsafe

ValueError: invalid literal for int() with base 10: "He got his money... now he lies in wait till after the election in 2 yrs.... dirty politicians need to be afraid of Tar and feathers again... but they aren't and so the people get screwed."

By the way, I try this to but I am getting same error :

train["comment"]=train["comment].str.replace(',','').astype(int)

And another question, that kind of converting is right way to prepare string data to regression?

CodePudding user response:

Assuming that the string values are numbers with data type string, try:

train['comment']= pd.to_numeric(train['comment'], errors='coerce')

If the column contains any NaN values, use this:

train['comment']= pd.to_numeric(train['comment'], errors='coerce').fillna(0).astype(np.int64)
  • Related