I am trying to prepare my data to regression. So I am trying to convert String column to integer with this code :
train["comment"] = train["comment"].astype(int)
But I am getting this error :
runfile('C:/Users/hayyi/.spyder-py3/temp.py', wdir='C:/Users/hayyi/.spyder-py3') Traceback (most recent call last):
File "C:\Users\hayyi.spyder-py3\temp.py", line 57, in train["comment"] = train["comment"].astype(int)
File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\generic.py", line 5815, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\internals\managers.py", line 418, in astype return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply applied = getattr(b, f)(**kwargs)
File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\internals\blocks.py", line 591, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\dtypes\cast.py", line 1309, in astype_array_safe new_values = astype_array(values, dtype, copy=copy)
File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\dtypes\cast.py", line 1257, in astype_array values = astype_nansafe(values, dtype, copy=copy)
File "D:\SpyderUI\MiniConda\envs\spyder-env\lib\site-packages\pandas\core\dtypes\cast.py", line 1174, in astype_nansafe return lib.astype_intsafe(arr, dtype)
File "pandas_libs\lib.pyx", line 679, in pandas._libs.lib.astype_intsafe
ValueError: invalid literal for int() with base 10: "He got his money... now he lies in wait till after the election in 2 yrs.... dirty politicians need to be afraid of Tar and feathers again... but they aren't and so the people get screwed."
By the way, I try this to but I am getting same error :
train["comment"]=train["comment].str.replace(',','').astype(int)
And another question, that kind of converting is right way to prepare string data to regression?
CodePudding user response:
Assuming that the string values are numbers with data type string, try:
train['comment']= pd.to_numeric(train['comment'], errors='coerce')
If the column contains any NaN values, use this:
train['comment']= pd.to_numeric(train['comment'], errors='coerce').fillna(0).astype(np.int64)