Home > Blockchain >  Why is the speech REST API response different from the go SDK API response?
Why is the speech REST API response different from the go SDK API response?

Time:06-16

When Calling the Speech-To-Text API via REST the response structure is slightly different than when calling with the Golang SDK.

Example, I've submitted an asynchronous speech job via the golang SDK. Then below I show the results of querying google cloud for the result of the transcription job via 2 different methods, REST and go SDK with slightly different results.

Method 1: REST call

GET https://speech.googleapis.com/v1/operations/{id}

{id} is the operation id, e.g (2593790426826555555)

RESULT 1, camelCased attributes with string typed startTime endTime attrs.

"words": [
  {
    "startTime": "0s",
    "endTime": "0.400s",
    "word": "We",
    "confidence": 0.98762906
  },
...

Method 2: go SDK

// omitting err handling,
client, err := speech.NewClient(ctx)
op, err := client.LROClient.GetOperation(ctx, &lropb.GetOperationRequest{Name: id})
resp := new(speechpb.LongRunningRecognizeResponse)
err = op.GetResponse().UnmarshalTo(resp)
js, err := json.Marshal(resp)
ioutil.WriteFile("sdk-response.json", js, 0644)

RESULT 2, snake_cased object types for start_time/end_time

"words": [
{
  "start_time": {},
  "end_time": {
    "nanos": 400000000
  },
  "word": "We",
  "confidence": 0.98762906
},
...

If you hunt down the type information in the SDK code, it does use start_time as the json tag so I suppose this is expected behavior. Or I could be incorrectly unmarshalling the response with op.GetResponse().UnmarshalTo(resp)? Any help or advice is appreciated.

StartTime *durationpb.Duration `protobuf:"bytes,1,opt,name=start_time,json=startTime,proto3" json:"start_time,omitempty"`

Using go 1.18.1 and cloud.google.com/go/speech v1.4.0

Update, elaborating on rationale for question I have 2 sets of transcripts that were downloaded via different methods (storage buckets vs. SDK). One was pulled from Google cloud storage buckets and these are persisted by Google as camcelCased in a bucket (same format as the REST API). I have another set of transcripts that were pulled from the SDK API and persisted using json encoding in golang, which applies snake_casing per the SDK's struct layout.

It isn't a huge deal to write some code to correct/normalize to a single format, but it is somewhat of inconsistency in my opinion. Raising the question to learn if I'm doing something wrong and it could be corrected or if this is to be expected.

CodePudding user response:

The JSON-marshaled Golang (structs) are protobufs (snake_case'd fields and the times are google.protobuf.Timestamp).

Can you try using the Golang protobuf protojson package instead of encoding/json as this should bijectively map JSON and Golang protobuf structs.

  • Related