Home > Software engineering >  How can I compare read(1.proto) = read(2.proto) in Go(assuming there's just one message definit
How can I compare read(1.proto) = read(2.proto) in Go(assuming there's just one message definit

Time:12-07

Context: I'm trying to resolve this issue.

In other words, there's a NormalizeJsonString() for JSON strings (see this for more context:

// Takes a value containing JSON string and passes it through
// the JSON parser to normalize it, returns either a parsing
// error or normalized JSON string.
func NormalizeJsonString(jsonString interface{}) (string, error) {

that allows to have the following code:

return structure.NormalizeJsonString(old) == structure.NormalizeJsonString(new)

but it doesn't work for strings that are proto files (all proto files are guaranteed to have just one message definition). For example, I could see:

            syntax = "proto3";
          - package bar.proto;
            
            package bar.proto;
            option java_outer_classname = "FooProto";
            
            message Foo {
              ...
          -   int64 xyz = 3;
              int64  xyz = 3;

Is there NormalizeProtoString available in some Go SDKs? I found MessageDifferencer but it's in C only. Another option I considered was to replace all new lines / group of whitespaces with a single whitespace but it's a little bit hacky.

CodePudding user response:

To do this in a semantic fashion, the proto definitions should really be parsed. Naively stripping and/or replacing whitespace may get you somewhere, but likely will have gotchas.

As far as I'm aware the latest official Go protobuf package don't have anything to handle parsing protobuf definitions - the protoc compiler handles that side of affairs, and this is written in C

There would be options to execute the protoc compiler to get hold of the descriptor set output (e.g. protoc --descriptor_set_out=...), however I'm guessing this would also be slightly haphazard considering it requires one to have protoc available - and version differences could potentially cause problems too.

Assuming that is no go, one further option is to use a 3rd party parser written in Go - github.com/yoheimuta/go-protoparser seems to handle things quite well. One slight issue when making comparisons is that the parser records meta information about source line column positions for each type; however it is relatively easy to make a comparison and ignore these, by using github.com/google/go-cmp

For example:

package main

import (
    "fmt"
    "log"
    "os"

    "github.com/google/go-cmp/cmp"
    "github.com/google/go-cmp/cmp/cmpopts"
    "github.com/yoheimuta/go-protoparser/v4"
    "github.com/yoheimuta/go-protoparser/v4/parser"
    "github.com/yoheimuta/go-protoparser/v4/parser/meta"
)

func main() {
    if err := run(); err != nil {
        log.Fatal(err)
    }
}

func run() error {
    proto1, err := parseFile("example1.proto")
    if err != nil {
        return err
    }

    proto2, err := parseFile("example2.proto")
    if err != nil {
        return err
    }

    equal := cmp.Equal(proto1, proto2, cmpopts.IgnoreTypes(meta.Meta{}))

    fmt.Printf("equal: %t", equal)

    return nil
}

func parseFile(path string) (*parser.Proto, error) {
    f, err := os.Open(path)
    if err != nil {
        return nil, err
    }
    defer f.Close()

    return protoparser.Parse(f)
}

outputs:

equal: true

for the example you provided.

  • Related