I have a test.csv
file as follows:
"N";"INFO"
"1";"<a href="www.google.it">www.google.it</a>"
I use the following program to print out the contents of the CSV file
import csv
with open('test.csv', newline='') as csvfile:
reader=csv.DictReader(csvfile, delimiter=';')
for p in reader:
print("%s %s" % (p['N'], p['INFO']))
The output is
1 <a href=www.google.it">www.google.it</a>"
The reason lies probably in the fact that the csv file has some "nested" double quotes. However, the separating character is ";", and so I would like the library to simply remove the double quote " at the beginning and at the end of the field INFO, keeping the rest of the string intact.
In other words, I would like the output of the program to be
1 <a href="www.google.it">www.google.it</a>
How can I fix that, without modifying the test.csv
file?
CodePudding user response:
One possibility is to use the csv
module with csv.QUOTE_NONE
, then handle the removal of the quotes (on both the fieldnames and the values) manually:
import csv
def strip_outer_quotes(s):
""" Strip an outer pair of quotes (only) from a string. If not quoted,
string is returned unchanged. """
if s[0] == s[-1] == '"':
return s[1:-1]
else:
return s
def my_csv_reader(fh):
""" Thin wrapper around csv.DictReader to handle fields which are
quoted but contain unquoted " characters. """
reader = csv.DictReader(fh, delimiter=';', quoting=csv.QUOTE_NONE)
reader.fieldnames = [strip_outer_quotes(fn) for fn in reader.fieldnames]
for row in reader:
yield {k: strip_outer_quotes(v) for k, v in row.items()}
with open('test.csv', newline='') as csvfile:
reader = my_csv_reader(csvfile)
for p in reader:
print("%s %s" % (p['N'], p['INFO']))
Note: instead of my_csv_reader
, probably name the function after the source of this particular variant of CSV; acme_csv_reader
or similar