I have a created a function serialize which takes the Data { a class containing 4 members int32,int64,float,double) as input and returns a encoded vector of bytes of all elements which I will further pass to deserialize function to get the original data back.
std::vector<uint8_t> serialize(Data &D)
{
std::vector<uint8_t> seriliazed_data;
std::vector<uint8_t> intwo = encode(D.Int32); // output [32 13 24 0]
std::vector<uint8_t> insf = encode(D.Int64); // output [233 244 55 134 255 23 55]
// float
float ft = D.Float; // float value eg 4.55
float *a; // I will encode them in binary format
char result[sizeof(float)];
memcpy(result, &ft, sizeof(ft));
// double
double dt = D.Double; // double value eg 4.55
double *c; // I will encode them in binary format
char resultdouble[sizeof(double)];
memcpy(resultdouble, &dt, sizeof(dt));
/////
///// How to bind everything here
/////
return seriliazed_data;
}
Data deserialize(std::vector<uint8_t> &Bytes) /// Vector returned from above function {
Data D2;
D2.Int64 = decode(Bytes, D2);
// D2.Int32 = decode(Bytes, D2);
// D2.float = decode(Bytes, D2);
// D2.double = decode(Bytes, D2);
/// Return original data ( All class members)
return D2;
}
I don't have any idea, of how to move forward.. Q1. If I bind everything in a single vector, how would I dissect them while deserializing. there should be some kind of delimiter? Q2. Is there any better way of doing it.
CodePudding user response:
If I bind everything in a single vector, how would I dissect them while deserializing. there should be some kind of delimiter?
In a stream, you either know what type that comes next - or you'll have to have some sort of type indicator in the stream. "Here comes a vector
of int
with size
..." etc:
vector int size elem1 elem2 ... elemX
Depending on how many types you need to support, the type information could be 1 or more bytes. If the smallest "unknown" entities are your classes, then you need one indicator per class you aim to support.
If you know exactly what should be in the stream, the type information for vector
and int
could be left out:
size elem1 elem2 ... elemX
Q2. Is there any better way of doing it.
One simplification could be make serialize
more generic so you could reuse it. If you have some
std::vector<uint8_t> encode(conts T& x)
overloads for the fundamental types (and perhaps container types) you'd like to support, you could make it something like this:
template <class... Ts>
std::vector<uint8_t> serialize(Ts&&... ts) {
std::vector<uint8_t> serialized_data;
[](auto& data, auto&&... vs) {
(data.insert(data.end(), vs.begin(), vs.end()), ...);
}(serialized_data, encode(ts)...);
return serialized_data;
}
You could then write serialization for a class simply by calling serialize
with all the member variables and you could make serialization of composit types pretty easy:
struct Foo {
int32_t x; // encode(int32_t) needed
std::string y; // encode(const string&) needed
std::vector<std::string> z; // encode(const vector<T>&) encode(const string&)
};
std::vector<uint8_t> encode(const Foo& f) {
return serialize(f.x, f.y, f.z);
}
struct Bar {
Foo f; // encode(const Foo&) needed
std::string s; // encode(const string&) needed
};
std::vector<uint8_t> encode(const Bar& b) {
return serialize(b.f, b.s);
}
The above makes encoding of classes pretty straight forward. To add serialization, you could add an adapter which simply references the object to serialize, encodes it and writes the encoded data to an ostream
:
struct BarSerializer {
Bar& b;
friend std::ostream& operator<<(std::ostream& os, const BarSerializer& bs) {
auto s = encode(bs.b); // encode(const Bar&) needed
return os.write(reinterpret_cast<const char*>(s.data()), s.size());
}
};
You'd make the deserialize
function template and decode
overloads in a similar manner.
CodePudding user response:
There is a high throughput solution to this, but it requires a number of caveats where it works.
- You have a compiler and architecture that supports packed alignment. GCC, clang, ICC, and MSVC all can, but it depends on your architecture as to efficiency. Good news, probably: i386 / x86_64 is pretty much a legend in not paying a penalty for unaligned memory access. SIMD won't work though.
- You have to be using POD members in your struct - std::vector, std::string, maps, sets, deques, lists, smart pointers won't work here. But a bundle of ints and floats will be fine. It's possible to work around this with custom reimplementations of those other structures, but let's keep this one simple. You can embed other
structs
and so on, as long as they are also POD. (POD == Plain Old Data https://en.cppreference.com/w/cpp/language/classes#POD_class) - Your data is coming over the wire in the same endianness for sender and reciever (Also workable around with custom data types that implement e.g.
operator int32_t()
, but#define
orconsteval
to endianness.) - Your communication channel is sending a single struct repeatedly, or can rely on a common header (for multiple struct types) to do dispatch in a
switch
.
Your code then becomes:
#pragma pack(push,1)
struct D
{
int32_t Int32;
int64_t Int64;
float Float;
double Double;
};
#pragma pack(pop)
const char * serialize(const Data& d)
{
return reinterpret_cast<const char *>(&d);
}
const Data& deserialize(const char * buffer)
{
return *reinterpret_cast<const Data*>(buffer);
}
The amount of data you need? Always sizeof(D)
. So serialize
will always give a const char *
pointer to sizeof(D)
data, and you need to have read sizeof(D)
data to pass to deserialize
.
Now, sure, you can pop all of this in and out of a std::vector<uint8_t>
. But the neat thing here, is that there are no memory copies required at all. You can literally use the object itself for serialization, and the raw char *
data from whatever medium you deserialize without any copies or expensive field by field operations.
Oh. And edited to add: things like Google protobufs, or Cap'n proto can probably help in the general case of the problems you might be looking to solve.