I have following code
import unicodecsv
CSV_PARAMS = dict(delimiter=",", quotechar='"', lineterminator='\n')
unireader = unicodecsv.reader(open('sample.csv', 'rb'), **CSV_PARAMS)
for line in unireader:
print(line)
and it prints
['\ufeff"003', 'word one"']
['003,word two']
['003,word three']
The CSV looks like this
"003,word one"
"003,word two"
"003,word three"
I am unable to figure out why the first row has \ufeff
(which is i believe a file marker). Moreover, there is "
at the beginning of first row.
The CSV file is comign from client so i can't dictate them how to save a file etc. Looking to fix my code so that it can handle encoding.
Note: I have already tried passing encoding='utf8'
to CSV_PARAMS
and it didn't solve the problem
CodePudding user response:
encoding='utf-8-sig'
will remove the UTF-8-encoded BOM (byte order mark) used a UTF-8 signature in some files:
import unicodecsv
with open('sample.csv','rb') as f:
r = unicodecsv.reader(f, encoding='utf-8-sig')
for line in r:
print(line)
Output:
['003,word one']
['003,word two']
['003,word three']
But why are you using the third-party unicodecsv
with Python 3? The built-in csv
module handles Unicode correctly:
import csv
# Note, newline='' is a documented requirement for the csv module
# for reading and writing CSV files.
with open('sample.csv', encoding='utf-8-sig', newline='') as f:
r = csv.reader(f)
for line in r:
print(line)