Home > Software engineering >  Why am I getting a value error when I try to load the csv file?
Why am I getting a value error when I try to load the csv file?

Time:04-21

I have been doing a course on Python and Machine Learning. I typed in the code below and I got a value error:

import numpy as np
from sklearn import preprocessing 
raw_csv_data = np.loadtxt("Audiobooks_data.csv", delimiter = ",")
unscaled_inputs_all = raw_csv_data[:,1:-1] 
targets_all = raw_csv_data[:,-1] 

The error message said:

ValueError                                Traceback (most recent call last)
Input In [2], in <cell line: 3>()
      1 import numpy as np
      2 from sklearn import preprocessing ## should standardize inputs using sklearn, accuracy decreases by 10% otherwise 
----> 3 raw_csv_data = np.loadtxt("Audiobooks_data.csv", delimiter = ",")
      4 unscaled_inputs_all = raw_csv_data[:,1:-1] ## takes all data except first and last columns (Id and target columns)
      5 targets_all = raw_csv_data[:,-1]

File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:1148, in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows, like)
   1143 # read data in chunks and fill it into an array via resize
   1144 # over-allocating and shrinking the array later may be faster but is
   1145 # probably not relevant compared to the cost of actually reading and
   1146 # converting the data
   1147 X = None
-> 1148 for x in read_data(_loadtxt_chunksize):
   1149     if X is None:
   1150         X = np.array(x, dtype)

File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:999, in loadtxt.<locals>.read_data(chunk_size)
    995     raise ValueError("Wrong number of columns at line %d"
    996                      % line_num)
    998 # Convert each value according to its column and store
--> 999 items = [conv(val) for (conv, val) in zip(converters, vals)]
   1001 # Then pack it according to the dtype's nesting
   1002 items = pack_items(items, packing)

File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:999, in <listcomp>(.0)
    995     raise ValueError("Wrong number of columns at line %d"
    996                      % line_num)
    998 # Convert each value according to its column and store
--> 999 items = [conv(val) for (conv, val) in zip(converters, vals)]
   1001 # Then pack it according to the dtype's nesting
   1002 items = pack_items(items, packing)

File ~\Anaconda3\envs\py3-TF2.0\lib\site-packages\numpy\lib\npyio.py:736, in _getconv.<locals>.floatconv(x)
    734 if '0x' in x:
    735     return float.fromhex(x)
--> 736 return float(x)

ValueError: could not convert string to float: ''

Do you have any idea why I am getting it? I get a similar error if I remove the delimiter, but it quotes the first row of the file instead. I tried to remove the blank rows of the file (which form every other row of the 28,000 row CSV file), but it was going to take a while, and I thought I should see what the best course of action is.

Any help would be greatly appreciated.

CodePudding user response:

I have tried to reproduce the same error I did not get any. It executed straight away. I got your data from here . Go there and download it, use this particular version and see if your error still persists.

CodePudding user response:

Rather than numpy you can try reading the csv file using the pandas library and the function pandas.read_csv().

As for the root of the problem, it seems that there is a string row mixed in the row of integers, which is why you are getting the error. Remember numpy arrays must ALL be of the same type.

  • Related