Home > Enterprise >  \ufeff is appearing while reading csv using unicodecsv module
\ufeff is appearing while reading csv using unicodecsv module

Time:06-02

I have following code

import unicodecsv
CSV_PARAMS = dict(delimiter=",", quotechar='"', lineterminator='\n')
unireader = unicodecsv.reader(open('sample.csv', 'rb'), **CSV_PARAMS)
for line in unireader:
    print(line)

and it prints

['\ufeff"003', 'word one"']
['003,word two']
['003,word three']

The CSV looks like this

"003,word one"
"003,word two"
"003,word three"

I am unable to figure out why the first row has \ufeff (which is i believe a file marker). Moreover, there is " at the beginning of first row.

The CSV file is comign from client so i can't dictate them how to save a file etc. Looking to fix my code so that it can handle encoding.

Note: I have already tried passing encoding='utf8' to CSV_PARAMS and it didn't solve the problem

CodePudding user response:

encoding='utf-8-sig' will remove the UTF-8-encoded BOM (byte order mark) used a UTF-8 signature in some files:

import unicodecsv

with open('sample.csv','rb') as f:
    r = unicodecsv.reader(f, encoding='utf-8-sig')
    for line in r:
        print(line)

Output:

['003,word one']
['003,word two']
['003,word three']

But why are you using the third-party unicodecsv with Python 3? The built-in csv module handles Unicode correctly:

import csv

# Note, newline='' is a documented requirement for the csv module
# for reading and writing CSV files.
with open('sample.csv', encoding='utf-8-sig', newline='') as f:
    r = csv.reader(f)
    for line in r:
        print(line)
  • Related