C : Casting unsigned char to a Structure-CodePudding

What I am trying to do

typedef struct {
    unsigned char a;
    unsigned char b;
    unsigned int  c;
} Packet;

unsigned char buffer[] = {1, 1, 0, 0, 0, 1};
Packet pkt = (Packet)buffer;

Basically I am trying to cast a byte array to a structure in C , when compiling I get:

No matching function call for Packet::Packet(unsigned char[6])

Is this not possible or do I have to manually index into the array?

CodePudding user response：

There are a few ways to do this:

// packet.h
////////////////
struct Packet {
    unsigned char a;
    unsigned char b;
    unsigned int  c;
};

If you compile and dump the structs with pahole you will see the paddings

$ pahole -dr --structs main.o
struct Packet {
        unsigned char              a;                    /*     0     1 */
        unsigned char              b;                    /*     1     1 */

        /* XXX 2 bytes hole, try to pack */

        unsigned int               c;                    /*     4     4 */

        /* size: 8, cachelines: 1, members: 3 */
        /* sum members: 6, holes: 1, sum holes: 2 */
        /* last cacheline: 8 bytes */
};

So it's basically the 2 chars, 2 padding bytes and 4 bytes of an int for a total of 8 bytes.

Because Intel is a little endian platform, the least significant byte comes first as in

void print_packet( Packet* pkt ) {
    printf( "a:%d b:%d c:%d\n", int(a), int(b), c );
}
int main() {
    unsigned char buffer[] = {1, 1, 0, 0, 1, 0, 0, 0};
    print_packet( (Packet*) buffer );
    print_packet( reinterpret_cast<Packet*>(buffer));
}

Produces:

$ g   main.cpp -o main
$ ./main
a:1 b:1 c:1
a:1 b:1 c:1

However one can change the packing from the command line as below where we set the alignment to 2 bytes.

$ g   -ggdb  main.cpp -o main -fpack-struct=2
$ pahole -dr --structs main
struct Packet {
        unsigned char              a;                    /*     0     1 */
        unsigned char              b;                    /*     1     1 */
        unsigned int               c;                    /*     2     4 */

        /* size: 6, cachelines: 1, members: 3 */
        /* last cacheline: 6 bytes */
} __attribute__((__packed__));

Then you can see that the Packet struct is only 6 bytes and the result of running main is completely different

$ ./main
a:1 b:1 c:65536
a:1 b:1 c:65536

This is because the value of c is now 0x00000100 or 65536

So not to be at mercy of these compiler shenanigans, it is better to define your packet in code with the right packing as

// packet.h
////////////////
struct [[gnu::packed]] Packet {
    unsigned char a;
    unsigned char b;
    unsigned char reserved[2];
    unsigned int  c;
};

Then execution becomes

$ g   -ggdb  main.cpp x.cpp -o main -fpack-struct=2
$ ./main
a:1 b:1 c:1
a:1 b:1 c:1
$ g   -ggdb  main.cpp x.cpp -o main -fpack-struct=4
$ ./main
a:1 b:1 c:1
a:1 b:1 c:1
$ g   -ggdb  main.cpp x.cpp -o main -fpack-struct=8
$ ./main
a:1 b:1 c:1
a:1 b:1 c:1
$ g   -ggdb  main.cpp x.cpp -o main -fpack-struct=16
$ ./main
a:1 b:1 c:1
a:1 b:1 c:1

CodePudding user response：

First of all your assumption that byte representation of your structure is excatly same as you write in struct is wrong for most of current architectures. For example, on 32-bit architecture you definition will be equivalent to

struct Packet {
  char a;
  char b;
  char __hidden_padding[2];
  int c;
};

Similar thing, but with different number of padding will happen on 64-bit architecture. So, to avoid this you need to tell compiler to "pack" structure without padding bytes. There is no standard syntaxis for this, but most compilers provide means to do this. For example, for gcc/clang you can do:

struct [[gnu::packed]] Packet {
  char a;
  char b;
  int c;
};

Warning, when working with such structures it is not advised to take address of its members, see Is gcc's __attribute__((packed)) / #pragma pack unsafe?.

Now, since "simple" types like char, int, etc have implementation defined size it is much better to use fixed-sized types, and finally check that structure size is what you expect, like Evg suggsested:

struct [[gnu::packed]] Packet {
  int8_t a;
  int8_t b;
  int32_t c;
};
static_assert(sizeof(Packet) == 6);

Copying is best done by either std::bit_cast if you have C 20 or just memcpy. These 2 are only standard ways today, as far as I know. Using *reinterpret_cast<Packet*>(buffer) is undefined, though still works for most compilers.

CodePudding user response：

You can do this with a reinterpret_cast from the array:

Packet pkt = *reinterpret_cast<Packet*>(buffer);

What this does is decay the array into a pointer to its 1st element, then treat that pointer as a Packet* pointer, then we dereference that and copy it into a new Packet structure. This circumvents essentially all compiler type and safety checks, so you need to be very careful here.

One thing we can do to make this a bit safer is to use a static_assert to ensure that the structure is the size that we expect. This will then fail to compile if the compiler inserts any padding into the structure definition.

static_assert(sizeof(Packet) == 6);

Depending on your compiler and compilation settings, it is almost certain that your structure as written is NOT 6 bytes.

Any time you are using reinterpret_cast, you are working very close to the realm of undefined / compiler dependent behavior. Generally speaking, as long as you do the padding checks and dealing with primitive data types inside the structure, things will work as you would expect even if the code is technically undefined according to the C standard. Compiler writers realize this type of code is often needed and so generally support this in a sane way even if not required to by the C standard.