Home > Back-end >  can't read quoted field with gocsv
can't read quoted field with gocsv

Time:01-10

I have a csv response that comes from an endpoint that I don't control and I'm failing to parse its response because it has quotes. It looks something like this:

name,id,quantity,"status (active, expired)"
John,14,4,active 
Bob,12,7,expired

to parse this response I have created the following struct:

type UserInfo struct {
Name     string `csv:"name"`
ID       string `csv:"id"`
Quantity string `csv:"quantity"`
Status   string `csv:"status (active, expired)"`
}

I have tried using

Status   string `csv:""status (active, expired)""`
Status   string `csv:'"status (active, expired)"'`

but none seem to be helpful, I just can't access the field Status when I use gocsv.Unmarshal.

var actualResult []UserInfo
err = gocsv.Unmarshal(in, &actualResult)

for _, elem := range actualResult {
    fmt.Println(elem.Status)
    }

And I get nothing as as response.

https://go.dev/play/p/lje1zNO9w6E here's an example

CodePudding user response:

You don't need third party package like gocsv (unless you have specific usecase) when it can be done easily with Go's builtin encoding/csv.

You just have to ignore first line/record which is csv header in your endpoint's response.

csvReader := csv.NewReader(strings.NewReader(csvString))

records, err := csvReader.ReadAll()
if err != nil {
    panic(err)
}

var users []UserInfo

// Iterate over all records excluding first one i.e., header
for _, record := range records[1:] {
    users = append(users, UserInfo{Name: record[0], ID: record[1], Quantity: record[2], Status: record[3]})
}

fmt.Printf("%v", users)
// Output: [{ John 14 4 active } { Bob 12 7 expired}]

Here is working example on Go Playground based on your use case and sample string.

CodePudding user response:

I simply don't think gocarina/gocsv can parse a header with a quoted comma. I don't see it spelled out anywhere in the documentation that it cannot, but I did some digging and there are clear examples of commas being used in the "CSV annotations", and it looks like the author only conceived of commas in the annotations being used for the purposes of the package/API, and not as part of the column name.

If we look at sample_structs_test.go from the package, we can see commas being used in some of the following ways:

  • in metadata directives, like "omitempty":

    type Sample struct {
        Foo  string  `csv:"foo"`
        Bar  int     `csv:"BAR"`
        Baz  string  `csv:"Baz"`
        ...
        Omit *string `csv:"Omit,omitempty"`
    }
    
  • for declaring that a field in the struct can be populated from multiple, different headers:

    type MultiTagSample struct {
        Foo string `csv:"Baz,foo"`
        Bar int    `csv:"BAR"`
    }
    

    You can see this in action, here.

FWIW, the official encoding/json package has the same limitation, and they note it (emphasis added):

The encoding of each struct field can be customized by the format string stored under the "json" key in the struct field's tag. The format string gives the name of the field, possibly followed by a comma-separated list of options. The name may be empty in order to specify options without overriding the default field name.

and

The key name will be used if it's a non-empty string consisting of only Unicode letters, digits, and ASCII punctuation except quotation marks, backslash, and comma.

So, you may not be able to get what you expect/want: sorry, this may just be a limitation of having the ability to annotate your structs. If you want, you could file a bug with gocarina/gocsv.

In the meantime, you can just modify the header as it's coming in. This is example is pretty hacky, but it works: it just replaces "status (active, expired)" with "status (active expired)" and uses the comma-less version to annotate the struct.

endpointReader := strings.NewReader(sCSV)

// Fix header
var bTmp bytes.Buffer
fixer := bufio.NewReader(endpointReader)
header, _ := fixer.ReadString('\n')
header = strings.Replace(header, "\"status (active, expired)\"", "status (active expired)", -1)
bTmp.Write([]byte(header))
// Read rest of CSV
bTmp.ReadFrom(fixer)

// Turn back into a reader
reader := bytes.NewReader(bTmp.Bytes())

var actualResult []UserInfo
...

I can run that and now get:

active 
expired
  • Related