Why am I getting a value error when I try to load the csv file?-CodePudding

I have been doing a course on Python and Machine Learning. I typed in the code below and I got a value error:

import numpy as np
from sklearn import preprocessing 
raw_csv_data = np.loadtxt("Audiobooks_data.csv", delimiter = ",")
unscaled_inputs_all = raw_csv_data[:,1:-1] 
targets_all = raw_csv_data[:,-1]

The error message said:

ValueError                                Traceback (most recent call last)
Input In [2], in <cell line: 3>()
      1 import numpy as np
      2 from sklearn import preprocessing ## should standardize inputs using sklearn, accuracy decreases by 10% otherwise 
----> 3 raw_csv_data = np.loadtxt("Audiobooks_data.csv", delimiter = ",")
      4 unscaled_inputs_all = raw_csv_data[:,1:-1] ## takes all data except first and last columns (Id and target columns)
      5 targets_all = raw_csv_data[:,-1]

File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:1148, in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows, like)
   1143 # read data in chunks and fill it into an array via resize
   1144 # over-allocating and shrinking the array later may be faster but is
   1145 # probably not relevant compared to the cost of actually reading and
   1146 # converting the data
   1147 X = None
-> 1148 for x in read_data(_loadtxt_chunksize):
   1149     if X is None:
   1150         X = np.array(x, dtype)

File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:999, in loadtxt.<locals>.read_data(chunk_size)
    995     raise ValueError("Wrong number of columns at line %d"
    996                      % line_num)
    998 # Convert each value according to its column and store
--> 999 items = [conv(val) for (conv, val) in zip(converters, vals)]
   1001 # Then pack it according to the dtype's nesting
   1002 items = pack_items(items, packing)

File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:999, in <listcomp>(.0)
    995     raise ValueError("Wrong number of columns at line %d"
    996                      % line_num)
    998 # Convert each value according to its column and store
--> 999 items = [conv(val) for (conv, val) in zip(converters, vals)]
   1001 # Then pack it according to the dtype's nesting
   1002 items = pack_items(items, packing)

File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:736, in _getconv.<locals>.floatconv(x)
    734 if '0x' in x:
    735     return float.fromhex(x)
--> 736 return float(x)

ValueError: could not convert string to float: ''

Do you have any idea why I am getting it? I get a similar error if I remove the delimiter, but it quotes the first row of the file instead. I tried to remove the blank rows of the file (which form every other row of the 28,000 row CSV file), but it was going to take a while, and I thought I should see what the best course of action is.

Any help would be greatly appreciated.

CodePudding user response：

I have tried to reproduce the same error I did not get any. It executed straight away. I got your data from here . Go there and download it, use this particular version and see if your error still persists.

CodePudding user response：

Rather than numpy you can try reading the csv file using the pandas library and the function pandas.read_csv().

As for the root of the problem, it seems that there is a string row mixed in the row of integers, which is why you are getting the error. Remember numpy arrays must ALL be of the same type.