I'm reading a big CSV file with encoding/csv
library.
But this file is a bit non-standard and contains non-escaped quotes "
breaking the reader at parser.Read()
:
2022/06/09 17:33:54 parse error on line 2, column 5: extraneous or missing " in quoted-field
And if I use parser.LazyQuotes = true
, I'm getting:
2022/06/09 17:34:15 record on line 2: wrong number of fields
Faulty CSV file (reduced to its minimum) foo.csv
:
1|2
"a|b
So I need to remove all occurences of double quotes "
and I'm currently doing it on the whole file from terminal using sed 's/"//g'
, but I want to remove it from Go script instead.
How should I do it knowing that I'm reading the file like this:
func processCSV(filepath string){
file, err := os.Open("foo.csv")
if err != nil {
log.Fatal(err)
}
parser := csv.NewReader(file)
parser.Comma = '|'
// parser.LazyQuotes = true
_, err = parser.Read() // skip headers
for {
record, err := parser.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
// process record
}
}
CodePudding user response:
Create an io.Reader that removes quotes from data read through an underlying io.Reader.
// rmquote reads r with " removed.
type rmquote struct {
r io.Reader
}
func (c rmquote) Read(p []byte) (int, error) {
n, err := c.r.Read(p)
// i is output position for loop below
i := 0
// for each byte read from the file
for _, b := range p[:n] {
// skip quotes
if b == '"' {
continue
}
// copy byte to output position and advance position
p[i] = b
i
}
// output position is the new length
return i, err
}
Plumb it in between the CSV reader and file:
parser := csv.NewReader(rmquote{file})