Hello I have a following structure:
struct TestStruct{
unsigned char a;
unsigned char b;
unsigned char c;
unsigned char d;
};
struct TestStruct test;
test.a = 0x01;
test.b = 0x02;
test.c = 0x01;
test.d = 0x02;
unsigned short int *ptr = (unsigned short int *)&test;
printf("x\n", *ptr );
printf("x\n", *ptr );
I want to get values 0x0102
but actually I get 0x0201
. How can figure it out without reordering fields in struct? I want to keep it because I am creating IP header from scratch (for learning purpose) and for better readability I want to have the same ordering with RFC documentation.
Thanks in advance.
CodePudding user response:
In computers, there is a concept of endianess. In short, when storing a multi-byte field, you must choose between storing the most significant byte first (big-endian), or the least significant byte first (little-endian). This difference is sometimes called byte-order by RFC documents.
If you are implementing code that speaks cross-endianess, you will need to be cognizant of which format values are read in. The header byteswap.h
is supplied to swap between formats in the most efficient ways. Consider the following example program:
#include <stdio.h>
#include <byteswap.h>
int main(void) {
unsigned int x = 0x01020304;
unsigned char * arr = (unsigned char *)&x;
printf("int: x\n", x);
printf("raw: x x x x\n", arr[0], arr[1], arr[2], arr[3]);
x = __bswap_32(x);
printf("swapped\n");
printf("int: x\n", x);
printf("raw: x x x x\n", arr[0], arr[1], arr[2], arr[3]);
}
On my computer, it outputs:
int: 01020304
raw: 04 03 02 01
swapped
int: 04030201
raw: 01 02 03 04
This shows that my computer is little endian. For the integer 0x01020304
, it stores the byte 0x04
in the smaller memory address.
For specifically network usage, linux provides headers that convert from network-host. These have the benefit of already 'knowing' what your internal order is, and handling the conversion for you. For example, here's an old snippet I wrote that parses headers of ARP-packets:
recvfrom(socket->fd, buffer, ETHER_FRAME_MAX_SIZE, 0, NULL, NULL);
frame->type = ntohs(frame->type);
frame->htype = ntohs(frame->htype);
frame->ptype = ntohs(frame->ptype);
frame->oper = ntohs(frame->oper);
This snippet converts the shorts in the struct into the correct host byte order, using the ntohs
(which is short for network-to-host-short) provided by arpa/inet.h
.
CodePudding user response:
Your implementation assumes that your machine is big-endian, which is usually not true on modern machines.
Big endian machines store multibyte values with the least significant byte in the highest address and the most significant byte in the lowest address, while little endian machines (which tend to be more common these days) do the exact opposite, storing the least significant byte in the lowest address and the most significant byte in the highest address. For instance this is how each architecture would represent the 4-byte value 0x01020304
if it were to be stored at memory addresses 0x10
-0x13
.
Endianness | Byte 0x10 | Byte 0x11 | Byte 0x12 | Byte 0x13 |
---|---|---|---|---|
Big | 0x01 | 0x02 | 0x03 | 0x04 |
Little | 0x04 | 0x03 | 0x02 | 0x01 |
The C-standard forces your compiler to place the elements in your struct in the order that they are defined, so when you fill the struct and then use type-punning to interpret the memory location as a 2-byte int
instead of (effectively) an array of 1-byte int
s, the computer will assume which byte is most significant and which is less significant based on its own endianness.
To manually force the computer to recognize a multi-byte value as the endianness you expect, you need to use bit-shifting to move each byte into its proper place, for instance, using your struct as an example:
unsigned short fixedEndianness = ((unsigned short)test.a << 8) | (unsigned short)test.b;
...which will work on any architecture