How does "transfer/casting" between 2 structs that have different structures works in C?-CodePudding

I'm learning HTTP protocol following a tutorial which gives an understandable piece of code and here's part of it.

struct sockaddr_in address;
...
address.sin_family = AF_INET;
address.sin_addr.s_addr = INADDR_ANY;
address.sin_port = htons( PORT );

memset(address.sin_zero, '\0', sizeof address.sin_zero);


if (bind(server_fd, (struct sockaddr *)&address, sizeof(address))<0)
{
    perror("In bind");
    exit(EXIT_FAILURE);
}

The example code works well, although I don't understand the some kind of transfer between two structs.

the definition of struct sockaddr_in in <netinet/in.h> is

struct sockaddr_in {
    __uint8_t   sin_len;
    sa_family_t sin_family;
    in_port_t   sin_port;
    struct  in_addr sin_addr;
    char        sin_zero[8];
};

the definition of struct sockaddr in <sys/socket.h> is

struct sockaddr {
    __uint8_t   sa_len;     /* total length */
    sa_family_t sa_family;  /* [XSI] address family */
    char        sa_data[14];    /* [XSI] addr value (actually larger) */
};

They have different structures, how the "transfer/casting" works there?

CodePudding user response：

I think this cast breaks the strict aliasing rule and then is undefined behaviour if the bind function dereferences the pointer.

In practice the code assumes that all fields of struct sockaddr_in are contiguous so you can access a buffer of bytes either as a struct sockaddr_in or as a struct sockaddr equivalently. But the fields of a structure are not guaranteed to be contiguous. If in_port_tis two bytes long for example, there may very well be a hole between sin_portand sin_addr with a 32 bytes machine compiler because it may want to align sin_addr field on 32 bytes address.

This way of coding is frequent when you develop a communication interface driver: you receive a buffer of bytes that need to be interpreted as a data structure (like: first byte is an adress, following bytes are a length, etc...). Casting from a structure to another one avoids to copy data.

Note that usually compilers provide non-standard-C ways to guarantee that all fields of structures are contigiuous. For example with gcc it is __attribute__((packed))

Now, to answer to your question: provided the structures are packed and there is no undefined behaviour, the cast basically does nothing. sa_data will be the array of bytes located after the field sin_family. So this array will consist of sin_port, followed by sin_addr followed by the array sin_zero.

CodePudding user response：

The casting works. Looking at the two structures:

struct sockaddr_in {
    __uint8_t   sin_len;
    sa_family_t sin_family;
    in_port_t   sin_port;
    struct in_addr sin_addr;
    char        sin_zero[8];
};

struct sockaddr {
    __uint8_t   sa_len;     /* total length */
    sa_family_t sa_family;  /* [XSI] address family */
    char        sa_data[14];    /* [XSI] addr value (actually larger) */
};

First two members, sin_len and sa_len, sin_family and sa_family will not be problematic as those are of the same data type. The padding for sa_family_t works exactly the same on both ends.

The header file netinet/in.h indicates that

in_port_t is Equivalent to the type uint16_t as defined in <inttypes.h>

For windows, struct in_addr looks like below:

struct in_addr {
    union {
        struct {
            u_char s_b1;
            u_char s_b2;
            u_char s_b3;
            u_char s_b4;
        } S_un_b;
        struct {
            u_short s_w1;
            u_short s_w2;
        } S_un_w;
        u_long S_addr;
    } S_un;
};

and that for a linux is:

struct in_addr {
   uint32_t s_addr;     /* address in network byte order */
};

The whole confusion you might have is because of how the contents align. However, it is a well-thought historic design. It is intended to accommodate implementation-dependent aspects in the design. When I Secondly, implementation-dependent -- it refers to the fact that implementation of in_addr_t is not consistent across all systems, as seen above.

In a nutshell, this entire magic works, because of the 2 things: The exact size and padding nature of the first two members and then lastly the data type of sa_data[14] is char, or more precisely an array of a 1-byte data-type. This design trick with union inside a struct has been widely used.

Unix Network Programming Volume 1 states:

The reason the sin_addr member is a structure, and not just an in_addr_t, is historical. Earlier releases (4.2BSD) defined the in_addr structure as a union of various structures, to allow access to each of the 4 bytes and to both of the 16-bit values contained within the 32-bit IPv4 address. This was used with class A, B, and C addresses to fetch the appropriate bytes of the address. But with the advent of subnetting and then the disappearance of the various address classes with classless addressing, the need for the union disappeared. Most systems today have done away with the union and just define in_addr as a structure with a single in_addr_t member.

Not what you asked for, but good to know:

The same header states:

The sockaddr_in structure is used to store addresses for the Internet address family. Values of this type shall be cast by applications to struct sockaddr for use with socket functions.

So, sockaddr_in is a struct specific to IP-based communication and sockaddr is more of a generic structure for socket operations.