I wanted to read data from a csv file using python but after using the following code there are some extra characters and braces in the text which is not in the original data. Please help to remove it.
import csv
with open("data.csv",encoding="utf8") as csvDataFile:
csvReader = csv.reader(csvDataFile)
for row in csvReader:
print(row)
What is displayed after reading is:- ['\ufeffwww.aslteramo.it']
CodePudding user response:
This is utf-8 encoding with a Byte Order Mark (BOM) - which is used as a signature in windows.
Open the file using the utf-8-sig
encoding instead of utf8
CodePudding user response:
\ufeff
is a UTF-8 BOM (also known as 'ZERO WIDTH NO-BREAK SPACE' character).
It's sometimes used to indicate that the file is in UTF-8 format.
You could use str.replace('\ufeff', '')
in your code to get rid of it. Like this:
import csv
with open("data.csv",encoding="utf8") as csvDataFile:
csvReader = csv.reader(csvDataFile)
for row in csvReader:
print([col.replace('\ufeff', '') for col in row])
Another solution is to open the file with 'utf-8-sig' encoding instead of 'utf-8' encoding.
By the way, the braces are a added because row
is a list. If your CSV file only has one column, you could select the first item from each row like this:
print(row[0].replace('\ufeff', ''))