I am reading code for a project written by others. The main task of the project is to read contents from a large structured text file (.txt) with 8 columns into a KnowledgeBase object, which have a number of methods and variables. The KnowledgeBase object is then output into a binary file. For example, the KnowledgeBase class has at least these two variables:
map<string, pair<string, string>> key_info
vector<ObjectInfo> objects
...
These variables are easy to understand when I track the code with gdb. Then, it seems it is converting such vectors and maps into binary forms. And the two variables above have their corresponding binary forms:
BinaryKeyInfo *bkeys
BinaryObjectInfo *bObjects
Later on when outputting to binary file, it has such code:
fwrite((char*)(&wcount),sizeof(int32_t),1,output);
fwrite((char*)bkeys,sizeof(KeyInfo_t),wcount,output);
The converting code from the original KnowledgeBase to binary is complicated. My question is, what's the main purpose of this conversion? Is it for faster loading of binary file into memory than plain text file? The plain text file is large. I learnt that object serialization is primarily for transmitting objects over the net, but I don't think the purpose here is for that. It is more like for speeding up data loading and memory saving. Could that be part of object serialization in C ?
CodePudding user response:
Is the main purpose of object serialization in C for faster object loading?
No. The most important purpose of serialisation is to transform the state of the program into a format that can be stored on the filesystem, or that can be communicated across a network, and that can be de-serialised back. Often, the purpose of either is for another program to do the de-serialisation. Sometimes the de-serialiser is another instance of the same program.
The speed of de-serialisation is one metric that can be used to gauge whether one particular serialisation format is a good one. The ability to quickly undo what you have done is not the reason why you do it in the first place.
what's the benefit of converting them into binary vectors or maps?
As I mention above, the benefit of serialisation is the ability to store the serialised data on the filesystem, or to send it over a network.
what' the benefit between plain text files VS binary files?
Pros of text serialisation format:
- Humans are able to read and write plain text. Humans generally are not able read nor write binary files.
- It's generally easier to implement a plain text format de-/serialiser in a way that works across differing computers than it is to implement a binary format de-/serialiser that achieves the same.
Pros of binary serialisation format:
- Typically faster and uses less storage and bandwidth.
- Can be easier to implement if there is no need for communication between differing systems. This is typically only the case in very simple cases. (Furthermore, there usually is a need for cross-system compatibility, even if the need haven't been realised yet).