I was looking for the quickest/efficient way to store Structs of data to persist on the filesystem. I came across the gob module which allows encoders and decoders to be set up for structs to convert to []byte (binary) that can be stored.
This was relatively easy - here's a decoding example:
// Per item get request
// binary = []byte for the encoded binary from database
// target = struct receiving what's being decoded
func Get(path string, target *SomeType) {
binary = someFunctionToGetBinaryFromSomeDB(path)
dec := gob.NewDecoder(bytes.NewReader(binary))
dec.Decode(target)
}
However, when I benchmarked this against JSON encoder/decoder, I found it to be almost twice as slow. This was especially noticeable when I created a loop to retrieve all structs. Upon further research, I learned that creating a NEW decoder every time is really expensive. 5000 or so decoders are re-created.
// Imagine 5000 items in total
func GetAll(target *[]SomeType{}) {
results = getAllBinaryStructsFromSomeDB()
for results.next() {
binary = results.getBinary()
// Making a new decoder 5000 times
dec := gob.NewDecoder(bytes.NewReader(binary))
var target someType
dec.Decode(target)
// ... append target to []SomeType{}
}
}
I'm stuck here trying to figure out how I can recycle (reduce reuse recycle!) a decoder for list retrieval. Understanding that the decoder takes an io.Reader, I was thinking it would be possible to 'reset' the io.Reader and use the same reader at the same address for a new struct retrieval, while still using the same decoder. I'm not sure how to go about doing that and I'm wondering if anyone has any ideas to shed some light. What I'm looking for is something like this:
// Imagine 5000 items in total
func GetAll(target *[]SomeType{}) {
// Set up some kind of recyclable reader
var binary []byte
reader := bytes.NewReader(binary)
// Make decoder based on that reader
dec := gob.NewDecoder(reader)
results = getAllBinaryStructsFromSomeDB()
for results.next() {
// Insert some kind of binary / decoder reset
// Then do something like:
reader.WriteTo(results.nextBinary())
var target someType
dec.Decode(target) // except of course this won't work
// ... append target to []SomeType{}
}
}
Thanks!
CodePudding user response:
I was looking for the quickest/efficient way to store Structs of data to persist on the filesystem
Instead of serializing your structs, represent your data primarily in a pre-made data store that fits your usage well. Then model that data in your Go code.
This may seem like the hard way or the long way to store data, but it will solve your performance problem by intelligently indexing your data and allowing filtering to be done without a lot of filesystem access.
I was looking for ... data to persist.
Let's start there as a problem statement.
gob module allows encoders and decoders to be set up for structs to convert to []byte (binary) that can be stored. However, ... I found it to be ... slow.
It would be. You'd have to go out of your way to make data storage any slower. Every object you instantiate from your storage will have to come from a filesystem read. The operating system will cache these small files well, but you'll still be reading the data every time.
Every change will require rewriting all the data, or cleverly determining which data to write to disk. Recall that there is no "insert between" operation for files; you'll be rewriting all bytes after to add bytes in the middle of a file.
You could do this concurrently, of course, and goroutines handle a bunch of async work like filesystem reads very well. But now you've got to start thinking about locking.
My point is, for the cost of trying to serialize your structures you can better describe your data at the persistent layer, and solve problems you're not even working on yet.
SQL is a pretty obvious choice, since you can make it work with sqlite as well as other sql servers that scale well; I hear mongodb is easy to wrangle these days, and depending on what you're doing with the data, redis has a number of attractive list, set and k/v operations that can easily be made atomic and consistent.
CodePudding user response:
The encoder and decoder are designed to work with streams of values. The encoder writes information describing a Go type to the stream once before transmitting the first value of the type. The decoder retains received type information for decoding subsequent values.
The type information written by the encoder is dependent on the order that the encoder encounters unique types, the order of fields in structs and more. To make sense of the stream, a decoder must read the complete stream written by a single encoder.
It is not possible to recycle decoders because of the way that type information is transmitted.
To make this more concrete, the following does not work:
var v1, v2 Type
var buf bytes.Buffer
gob.NewEncoder(&buf).Encode(v1)
gob.NewEncoder(&buf).Encode(v2)
var v3, v4 Type
d := gob.NewDecoder(&buf)
d.Decode(&v3)
d.Decode(&v4)
Each call to Encode writes information about Type
to the buffer. The second call to Decode fails because a duplicate type is received.