Home > Back-end >  CSV reader incorrectly parses tabspaces after inverted commas
CSV reader incorrectly parses tabspaces after inverted commas

Time:11-19

I am using the CSV reader to read a TSV in Python. The code is:

f = csv.reader(open('sample.csv'), delimiter='\t')
for chunk in f:
   print(chunk)

One row from the tab separated CSV file looks like this (csv hosted here):

doc unit1_toks unit2_toks unit1_txt1 unit2_txt2 s1_toks s2_toks unit1_sent unit2_sent dir
GUM_bio_galois 156-160 161-170 " We zouden dan voorstellen dat de auteur al zijn werk zou moeten publiceren 107-182 107-182 Poisson declared Galois ' work " incomprehensible " , declaring that " [ Galois ' ] argument is not sufficient . " [ 16 ] Poisson declared Galois ' work " incomprehensible " , declaring that " [ Galois ' ] argument would then suggest that the author should publish the opinion . " [ 16 ] 1>2

I am getting the following output (the CSV reader is missing some tab spaces):

['GUM_bio_galois', 
'156-160', 
'161-170', 
' We zouden dan voorstellen\tdat de auteur al zijn werk zou moeten publiceren\t107-182\t107-182\tPoisson declared Galois \' work  incomprehensible " , declaring that " [ Galois \' ] argument is not sufficient . " [ 16 ]', 
'Poisson declared Galois \' work " incomprehensible " , declaring that " [ Galois \' ] argument would then suggest that the author should publish the opinion . " [ 16 ]', 
'1>2']

I want it to look like this:

['GUM_bio_galois', 
'156-160', 
'161-170', 
'" We zouden dan voorstellen',
'dat de auteur al zijn werk zou moeten publiceren',
'107-182',
'107-182',
'Poisson declared Galois \' work  incomprehensible " , declaring that " [ Galois \' ] argument is not sufficient . " [ 16 ]', 
'Poisson declared Galois \' work " incomprehensible " , declaring that " [ Galois \' ] argument would then suggest that the author should publish the opinion . " [ 16 ]', 
'1>2']

How can I get the CSV reader to handle incomplete quotes and retain them in my output?

CodePudding user response:

import csv
with open('sample.csv') as f:
   rdr = csv.reader(f, quoting=csv.QUOTE_NONE, delimiter='\t')
   header = next(rdr)
   for line in rdr:
      print(line)

or using csv.DictReader:

import csv
with open('sample.csv') as f:
   rdr = csv.DictReader(f, quoting=csv.QUOTE_NONE, delimiter='\t')
   for line in rdr:
      print(line)
  • Related