Home > Software design >  Reading raw byte data from a file and decoding it to a protobuf structs
Reading raw byte data from a file and decoding it to a protobuf structs

Time:09-30

What I'm trying to do here: I have a dump from the Kafka stream with an unknown amount of photobuff records stored there in binary format. I want to decode them and print them one by one to console in JSON format. I have looked all over the internet but seems that there is no clear answer on reading data from the raw binary file with an unknown amount of photobuff records inside of it. I found this one: How to decode binary/raw google protobuf data but it is related to the simple decoding of one known record with protoc

I've tried the following, but I seem to do not understand fully how to work with proto.buffer.go struct, since I can only see the first value, out of all the 26 kb data.

package main

import (
    "encoding/json"
    "fmt"
    "github.com/golang/protobuf/proto"
    "io/ioutil"
    "parseRawDHCP/pb"
)

func main() {
    file, err := ioutil.ReadFile("file")
    if err != nil {
        fmt.Printf("unable to read file %v", err)
    }
    msg := pb.Msg{}
    buffer := proto.NewBuffer(file)
    for {
        err := buffer.DecodeMessage(&msg)
        if err != nil {
            panic("unable to decode message")
        }
        marshalledStruct, err := json.Marshal(msg)
        if err != nil {
            panic("can't marshalledStruct the data from message")
        }
        if err == nil {
            fmt.Printf("message is: %v", marshalledStruct)
            continue
        }
    }
}

If someone can point me in a direction on how to correctly decode raw binary into protobuffs I would greatly appreciate it.

CodePudding user response:

A proto message by itself comes with no length and no end-of-message indication.

If your file contains marshalled proto messages all jammed together, then there's no way to decode them individually. An attempt to decode multiple messages as a single one will decode everything into a single struct, overwriting every field as it proceeds.

If your file contains length-prefixed messages (see buffer.EncodeMessage), then your sample code should be able to decode them (and panic at EOF). But I doubt that they were serialized that way.

  • Related