How to pack all arrays of bytes of data members everything in a single vector?-CodePudding

I have a created a function serialize which takes the Data { a class containing 4 members int32,int64,float,double) as input and returns a encoded vector of bytes of all elements which I will further pass to deserialize function to get the original data back.

std::vector<uint8_t> serialize(Data &D)
{

    std::vector<uint8_t> seriliazed_data;
    std::vector<uint8_t> intwo = encode(D.Int32);  // output [32 13 24 0]
    std::vector<uint8_t> insf = encode(D.Int64);    // output [233 244 55 134 255 23 55] 
    // float
    float ft = D.Float;    // float value eg 4.55 
    float *a;                 // I will encode them in binary format
    char result[sizeof(float)];
    memcpy(result, &ft, sizeof(ft));
    // double
    double dt = D.Double;    // double value eg 4.55 
    double *c;                 // I will encode them in binary format
    char resultdouble[sizeof(double)];
    memcpy(resultdouble, &dt, sizeof(dt));
       /////
       ///// How to bind everything  here
       /////

    return seriliazed_data;
}


 Data deserialize(std::vector<uint8_t> &Bytes)  /// Vector returned from above function { 
    
     Data D2;
  
    D2.Int64 = decode(Bytes, D2);
    // D2.Int32 = decode(Bytes, D2);
    // D2.float = decode(Bytes, D2);
    // D2.double = decode(Bytes, D2);
    
    /// Return original data ( All class members)
    return D2;
}

I don't have any idea, of how to move forward.. Q1. If I bind everything in a single vector, how would I dissect them while deserializing. there should be some kind of delimiter? Q2. Is there any better way of doing it.

Complete Code

CodePudding user response：

If I bind everything in a single vector, how would I dissect them while deserializing. there should be some kind of delimiter?

In a stream, you either know what type that comes next - or you'll have to have some sort of type indicator in the stream. "Here comes a vector of int with size ..." etc:

vector int size elem1 elem2 ... elemX

Depending on how many types you need to support, the type information could be 1 or more bytes. If the smallest "unknown" entities are your classes, then you need one indicator per class you aim to support.

If you know exactly what should be in the stream, the type information for vector and int could be left out:

size elem1 elem2 ... elemX

Q2. Is there any better way of doing it.

One simplification could be make serialize more generic so you could reuse it. If you have some

std::vector<uint8_t> encode(conts T& x)

overloads for the fundamental types (and perhaps container types) you'd like to support, you could make it something like this:

template <class... Ts>
std::vector<uint8_t> serialize(Ts&&... ts) {
    std::vector<uint8_t> serialized_data;

    [](auto& data, auto&&... vs) {
        (data.insert(data.end(), vs.begin(), vs.end()), ...);
    }(serialized_data, encode(ts)...);

    return serialized_data;
}

You could then write serialization for a class simply by calling serialize with all the member variables and you could make serialization of composit types pretty easy:

struct Foo {
    int32_t x;                  // encode(int32_t) needed
    std::string y;              // encode(const string&) needed
    std::vector<std::string> z; // encode(const vector<T>&)   encode(const string&)
};

std::vector<uint8_t> encode(const Foo& f) {
    return serialize(f.x, f.y, f.z);
}

struct Bar {
    Foo f;                      // encode(const Foo&) needed
    std::string s;              // encode(const string&) needed
};

std::vector<uint8_t> encode(const Bar& b) {
    return serialize(b.f, b.s);
}

The above makes encoding of classes pretty straight forward. To add serialization, you could add an adapter which simply references the object to serialize, encodes it and writes the encoded data to an ostream:

struct BarSerializer {
    Bar& b;
    friend std::ostream& operator<<(std::ostream& os, const BarSerializer& bs) {
        auto s = encode(bs.b);  // encode(const Bar&) needed
        return os.write(reinterpret_cast<const char*>(s.data()), s.size());
    }    
};

You'd make the deserialize function template and decode overloads in a similar manner.

CodePudding user response：

There is a high throughput solution to this, but it requires a number of caveats where it works.

You have a compiler and architecture that supports packed alignment. GCC, clang, ICC, and MSVC all can, but it depends on your architecture as to efficiency. Good news, probably: i386 / x86_64 is pretty much a legend in not paying a penalty for unaligned memory access. SIMD won't work though.
You have to be using POD members in your struct - std::vector, std::string, maps, sets, deques, lists, smart pointers won't work here. But a bundle of ints and floats will be fine. It's possible to work around this with custom reimplementations of those other structures, but let's keep this one simple. You can embed other structs and so on, as long as they are also POD. (POD == Plain Old Data https://en.cppreference.com/w/cpp/language/classes#POD_class)
Your data is coming over the wire in the same endianness for sender and reciever (Also workable around with custom data types that implement e.g. operator int32_t(), but #define or consteval to endianness.)
Your communication channel is sending a single struct repeatedly, or can rely on a common header (for multiple struct types) to do dispatch in a switch.

Your code then becomes:

#pragma pack(push,1)  
struct D
{
   int32_t Int32;
   int64_t Int64;
   float Float;
   double Double;
};
#pragma pack(pop)

const char * serialize(const Data& d)
{
    return reinterpret_cast<const char *>(&d);
}

const Data& deserialize(const char * buffer)
{
    return *reinterpret_cast<const Data*>(buffer);
}

The amount of data you need? Always sizeof(D). So serialize will always give a const char * pointer to sizeof(D) data, and you need to have read sizeof(D) data to pass to deserialize.

Now, sure, you can pop all of this in and out of a std::vector<uint8_t>. But the neat thing here, is that there are no memory copies required at all. You can literally use the object itself for serialization, and the raw char * data from whatever medium you deserialize without any copies or expensive field by field operations.

Oh. And edited to add: things like Google protobufs, or Cap'n proto can probably help in the general case of the problems you might be looking to solve.