I have been doing a course on Python and Machine Learning. I typed in the code below and I got a value error:
import numpy as np
from sklearn import preprocessing
raw_csv_data = np.loadtxt("Audiobooks_data.csv", delimiter = ",")
unscaled_inputs_all = raw_csv_data[:,1:-1]
targets_all = raw_csv_data[:,-1]
The error message said:
ValueError Traceback (most recent call last)
Input In [2], in <cell line: 3>()
1 import numpy as np
2 from sklearn import preprocessing ## should standardize inputs using sklearn, accuracy decreases by 10% otherwise
----> 3 raw_csv_data = np.loadtxt("Audiobooks_data.csv", delimiter = ",")
4 unscaled_inputs_all = raw_csv_data[:,1:-1] ## takes all data except first and last columns (Id and target columns)
5 targets_all = raw_csv_data[:,-1]
File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:1148, in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows, like)
1143 # read data in chunks and fill it into an array via resize
1144 # over-allocating and shrinking the array later may be faster but is
1145 # probably not relevant compared to the cost of actually reading and
1146 # converting the data
1147 X = None
-> 1148 for x in read_data(_loadtxt_chunksize):
1149 if X is None:
1150 X = np.array(x, dtype)
File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:999, in loadtxt.<locals>.read_data(chunk_size)
995 raise ValueError("Wrong number of columns at line %d"
996 % line_num)
998 # Convert each value according to its column and store
--> 999 items = [conv(val) for (conv, val) in zip(converters, vals)]
1001 # Then pack it according to the dtype's nesting
1002 items = pack_items(items, packing)
File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:999, in <listcomp>(.0)
995 raise ValueError("Wrong number of columns at line %d"
996 % line_num)
998 # Convert each value according to its column and store
--> 999 items = [conv(val) for (conv, val) in zip(converters, vals)]
1001 # Then pack it according to the dtype's nesting
1002 items = pack_items(items, packing)
File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:736, in _getconv.<locals>.floatconv(x)
734 if '0x' in x:
735 return float.fromhex(x)
--> 736 return float(x)
ValueError: could not convert string to float: ''
Do you have any idea why I am getting it? I get a similar error if I remove the delimiter, but it quotes the first row of the file instead. I tried to remove the blank rows of the file (which form every other row of the 28,000 row CSV file), but it was going to take a while, and I thought I should see what the best course of action is.
Any help would be greatly appreciated.
CodePudding user response:
I have tried to reproduce the same error I did not get any. It executed straight away. I got your data from here . Go there and download it, use this particular version and see if your error still persists.
CodePudding user response:
Rather than numpy you can try reading the csv file using the pandas library and the function pandas.read_csv().
As for the root of the problem, it seems that there is a string row mixed in the row of integers, which is why you are getting the error. Remember numpy arrays must ALL be of the same type.