Home > Software engineering >  Go: CSV NewReader not getting the correct number of fields
Go: CSV NewReader not getting the correct number of fields

Time:05-20

How to get the correct number of fields when using NewReader ?

package main

import (
    "encoding/csv"
    "fmt"
    "log"
    "strings"
)

func main() {
    parser := csv.NewReader(strings.NewReader(`||""FOO""||`))
    parser.Comma = '|'
    parser.LazyQuotes = true
    record, err := parser.Read()
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("record length: %v\n", len(record))
}

https://go.dev/play/p/gg-KYRciWFH

It should return 5, but instead I'm getting 3:

record length: 3

Program exited.

EDIT

I'm actually working with a big CSV file containing many double quotes.

CodePudding user response:

After examining your code, I decided to modify it slightly and then print the results:

package main

import (
    "encoding/csv"
    "fmt"
    "log"
    "strings"
)

func main() {
    parser := csv.NewReader(strings.NewReader(`x||""FOO""|x|x\n`))
    parser.Comma = '|'
    parser.LazyQuotes = true
    record, err := parser.Read()
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("record length: %v, Data: %v\n", len(record), strings.Join(record, ", "))
}

When you run this, the data is printed as x, , "FOO"||x|x\n". My thought is that when you end your entry with two double-quotes, the parser is assuming the string is still being quoted and therefore lumps the rest of the line into the third entry. This appears to be a bug with how lazy-quoting works in the csv package, however, when examining the documentation for LazyQuotes, you'll see this:

If LazyQuotes is true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.

This doesn't mention anything about finding double quotes within double quotes. To fix this, you should either remove the quotes altogether or replace the double double-quotes ("") with double quotes (").

One other thing you might consider would be using the gocsv package. I've worked with this package in the past and it's reasonably stable. I'm not sure how it would respond to this specific issue, but it might be worth your time checking it out.

CodePudding user response:

Note:

The encoding/csv package implements the RFC 4180 standard. If you have such input, that's not an RFC 4180 compliant CSV file and encoding/csv will not parse it properly.


You're misusing the quotes. Quoting a single field FOO is like this:

parser := csv.NewReader(strings.NewReader(`||"FOO"||`))

If you want the field to have the "FOO" value, you have to use 2 double quotes in a quoted field, so it should be:

parser := csv.NewReader(strings.NewReader(`||"""FOO"""||`))

This will output 5. Try it on the Go Playground.

What you have is this:

parser := csv.NewReader(strings.NewReader(`||""FOO""||`))

Since the second " character is not followed by a separator character, the field is not interrupted and the rest is processed as the content of the quoted field (which will terminate at the end of the line).

If you print the record:

fmt.Println(record)
fmt.Printf("%#v", record)

Output will be (try it on the Go Playground):

[  "FOO"||]
[]string{"", "", "\"FOO\"||"}

CodePudding user response:

Quotes are a part of csv format.

There is a problem with go/csv shielding, you can try something like this:

package main

import (
    "encoding/csv"
    "fmt"
    "log"
    "strings"
)

func main() {
    parser := csv.NewReader(strings.NewReader(`||FOO||`))
    parser.Comma = '|'
    parser.LazyQuotes = true
    record, err := parser.Read()
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("record length: %v\n", len(record))
    fmt.Println(strings.Join(record, " /SEP/ "))
}

or like this:

package main

import (
    "encoding/csv"
    "fmt"
    "log"
    "strings"
)

func main() {
    parser := csv.NewReader(strings.NewReader(`||"""FOO"""||`))
    parser.Comma = '|'
    parser.LazyQuotes = true
    record, err := parser.Read()
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("record length: %v\n", len(record))
    fmt.Println(strings.Join(record, " SEP "))
}

  •  Tags:  
  • go
  • Related