Home > Mobile >  Reading from CSV file without tokenizing words into letters and numbers into digits
Reading from CSV file without tokenizing words into letters and numbers into digits

Time:11-10

I am downloading csv file and then reading it using csv module. For some reason, words and numbers get tokenized into letters and single digits. However, there is exception with "1 Mo", "3 Mo" etc.

I am getting csv file from here:

url = https://home.treasury.gov/resource-center/data-chart-center/interest-rates/daily-treasury-rates.csv/2022/all?type=daily_treasury_yield_curve&field_tdr_date_value=2022&page&_format=csv

I use Python 3.10 and the code looks as follows:

from urllib.request import urlopen
import csv

response = urlopen(url)
content = response.read().decode('utf-8')
csv_data = csv.reader(content, delimiter=',')
for row in csv_data:
    print(row)

Here is what I am getting:

['D']
['a']
['t']
['e']
['','']
['1 Mo']
['','']
['2 Mo']
['','']
['3 Mo']
['','']
.
.
.
['30 Yr']
[]
['1']
['1']
['/']
['0']
['8']
['/']
.
.
.

I tried different delimiters but it does not help.

P.S. When I simply save csv file to drive and then open it - everything works properly. But I don't want to have this extra step.

CodePudding user response:

Check out the documentation for csv.reader at this link:

csv.reader(csvfile, dialect='excel', **fmtparams)

...csvfile can be any object which supports the iterator protocol and returns a string each time its __next__() method is called -- file objects and list objects are both suitable...

Notice that your variable content is a string, not a file. In Python, strings may be iterators, but their __next__() method does not return the next line. You probably want to convert your long CSV string into a list of lines, so that __next__() (when it is called internally to the reader function) will give the next line instead of the next character. Note that this is why your code mysteriously works when you save the CSV to a file first - an open file iterator returns the next line of input each time __next__() is invoked.

To accomplish this, try using the following line in place of line 4:

content = response.read().decode('utf-8').split("\n")
  • Related