Home > Enterprise >  Python adding extra text and braces while reading from CSV file
Python adding extra text and braces while reading from CSV file

Time:11-29

I wanted to read data from a csv file using python but after using the following code there are some extra characters and braces in the text which is not in the original data. Please help to remove it.

import csv

with open("data.csv",encoding="utf8") as csvDataFile:
    csvReader = csv.reader(csvDataFile)

    for row in csvReader:
        print(row)

screenshot of orignal data

What is displayed after reading is:- ['\ufeffwww.aslteramo.it']

CodePudding user response:

This is utf-8 encoding with a Byte Order Mark (BOM) - which is used as a signature in windows.

Open the file using the utf-8-sig encoding instead of utf8

CodePudding user response:

\ufeff is a UTF-8 BOM (also known as 'ZERO WIDTH NO-BREAK SPACE' character).

It's sometimes used to indicate that the file is in UTF-8 format.

You could use str.replace('\ufeff', '') in your code to get rid of it. Like this:

import csv

with open("data.csv",encoding="utf8") as csvDataFile:
    csvReader = csv.reader(csvDataFile)
    for row in csvReader:
        print([col.replace('\ufeff', '') for col in row])

Another solution is to open the file with 'utf-8-sig' encoding instead of 'utf-8' encoding.

By the way, the braces are a added because row is a list. If your CSV file only has one column, you could select the first item from each row like this:

print(row[0].replace('\ufeff', ''))
  • Related