C/C Little/Big Endian handler-CodePudding

There are two systems that communicate via TCP. One uses Little endian and the second one Big Endian. The ICD between systems contains a lot of structs (fields). Making bytes swap for each field looks like not the best solution. Is there any generic solution/practice for handling communication between systems with different endian?

CodePudding user response：

Each system may have a different architecture, but endianness should be defined by the communication protocol. If the protocol says "data must be sent as big endian", then that's how the system sends it and how the other system receives it.

I am guessing the reason why you're asking is because you would like to cast a struct pointer to a char* and just send it over the wire, and this won't work.

That is generally a bad idea. It's far better to create an actual serializer, so that your internal data is decoupled from the actual protocol, which also means you can easily add support for different protocols in the future, or different versions of the protocols. You also don't have to worry about struct padding, aliasing, or any implementation-defined issues that casting brings along.

(update)

So generally, you would have something like:

void Serialize(const struct SomeStruct *s, struct BufferBuilder *bb)
{
    BufferBuilder_append_u16_le(bb, s->SomeField);
    BufferBuilder_append_s32_le(bb, s->SomeOther);
    
    ...
    
    BufferBuilder_append_u08(bb, s->SomeOther);
}

Where you would already have all these methods written in advance, like

// append unsigned 16-bit value, little endian
void BufferBuilder_append_u16_le(struct BufferBuilder *bb, uint16_t value)
{
    if (bb->remaining < sizeof(value))
    {
        return; // or some error handling, whatever
    } 
    
    memcpy(bb->buffer, &value, sizeof(value));
    bb->remaining -= sizeof(value);
}

We use this approach because it's simpler to unit test these "appending" methods in isolation, and writing (de)serializers is then a matter of just calling them in succession.

But of course, if you can pick any protocol and implement both systems, then you could simply use protobuf and avoid doing a bunch of plumbing.

CodePudding user response：

Generally speaking, values transmitted over a network should be in network byte order, i.e. big endian. So values should be converted from host byte order to network byte order for transmission and converted back when received.

The functions htons and ntohs do this for 16 bit integer values and htonl and ntohl do this for 32 bit integer values. On little endian systems these functions essentially reverse the bytes, while on big endian systems they're a no-op.

So for example if you have the following struct:

struct mystruct {
    char f1[10];
    uint32_t f2;
    uint16_t f3;
};

Then you would serialize the data like this:

// s points to the struct to serialize
// p should be large enough to hold the serialized struct
void serialize(struct mystruct *s, unsigned char *p)
{
    memcpy(p, s->f1, sizeof(s->f1));
    p  = sizeof(s->f1);

    uint32_t f2_tmp = htonl(s->f2);
    memcpy(p, &f2_tmp, sizeof(f2_tmp));
    p  = sizeof(s->f2);

    uint16_t f3_tmp = htons(s->f3);
    memcpy(p, &f3_tmp, sizeof(f3_tmp));
}

And deserialize it like this:

// s points to a struct which will store the deserialized data
// p points to the buffer received from the network
void deserialize(struct mystruct *s, unsigned char *p)
{
    memcpy(s->f1, p, sizeof(s->f1));
    p  = sizeof(s->f1);

    uint32_t f2_tmp;
    memcpy(&f2_tmp, p, sizeof(f2_tmp));
    s->f2 = ntohl(f2_tmp);
    p  = sizeof(s->f2);

    uint16_t f3_tmp;
    memcpy(&f3_tmp, p, sizeof(f3_tmp));
    s->f3 = ntohs(f3_tmp);
}

While you could use compiler specific flags to pack the struct so that it has a known size, allowing you to memcpy the whole struct and just convert the integer fields, doing so means that certain fields may not be aligned properly which can be a problem on some architectures. The above will work regardless of the overall size of the struct.

CodePudding user response：

You mention one problem with struct fields. Transmitting structs also requires taking care of alignment of fields (causing gaps between fields): compiler flags.

For binary data one can use Abstract Syntax Notation One (ASN.1) where you define the data format. There are some alternatives. Like Protocol Buffers.

In C one can with macros determine endianess and field offsets inside a struct, and hence use such a struct description as the basis for a generic bytes-to-struct conversion. So this would work independent of endianess and allignment. You would need to create such a descriptor for every struct.

Alternatively a parser might generate code for bytes-to-struct conversion.

But then again you could use a language neutral solution like ASN.1.

C and C ofcourse have no introspection/reflection capabilities like Java has, so that are the only solutions.

CodePudding user response：

The fastest and most portable way is to use bit shifts. These have the big advantage that you only need to know the network endianess, never the CPU endianess.

Example:

uint8_t  buf[4] = { MS_BYTE, ... LS_BYTE}; // some buffer from TCP/IP = Big Endian
uint32_t my_u32 = ((uint32_t)buf[0] << 24) |
                  ((uint32_t)buf[1] << 16) |
                  ((uint32_t)buf[2] <<  8) |
                  ((uint32_t)buf[3] <<  0) ;

Do not use (bit-field) structs/type punning directly on the input. They are poorly standardized, may contain padding/alignment requirements, depend on endianess. It is fine to use structs if you have proper serialization/deserialization routines in between. A deserialization routine may contain the above bit shifts, for example.
Do not use pointer arithmetic to iterate across the input, or plain memcpy(). Neither one of these solves the endianess issue.
Do not use htons etc bloat libs. Because they are non-portable. But more importantly because anyone who can't write a simple bit shift like above without having some lib function holding their hand, should probably stick to writing high level code in a more family-friendly programming language.

There is no point in writing code in C if you don't have a clue about how to do efficient, close to the hardware programming, also known as the very reason you picked C for the task to begin with.

EDIT
Helping hand for people who are confused over how C code gets translated to asm: https://godbolt.org/z/TT1MP7oc4. As we can see, the machine code is identical on x86 Linux. The htonl won't compile on a number of embedded targets, nor on MSVC, while leading to worse performance on Mips64.