C - Type punning, Strict aliasing, and Endianness-CodePudding

I've recently read about type punning and strict aliasing in C. I believe the following attempt at type-punning violates the strict aliasing rule:

uint32_t x = 0;
float f = *(float *)&x;

In order to type-pun correctly, Wikipedia says "the strict alias rule is broken only by an explicit memcpy or by using a char pointer as a "middle man" (since those can be freely aliased)."

So, my first question is: does the following code violate the strict aliasing rule (or invoke undefined/unspecified behavior)? Several sources say this is legal and fine, while others say it's not:

uint32_t x = 0;
float f = *(float *)(char *)&x;

If so, (how) could this code be fixed? (still using the same "char pointer as a 'middle man' " idea) Or would I have to instead use memcpy or a union?
If not, why? How would casting to char* and then to float* be any "safer" than simply casting to float* (or is safety not the issue)?

My second question regards endianness, since that also seems to come up when discussing type-punning.

If I malloc() some memory for two different data types (assuming aligned properly), could reading one or the other have different results on different platforms? As an example:

float *p = malloc(sizeof(uint32_t)   sizeof(float)); // Allocating space for a uint32_t and a float

uint32_t *a = (uint32_t *)(char *)p;
float *b = (char *)p   sizeof(uint32_t);

// Use a and b, etc.

Could this change based on the endianness of the system? I'd assume not since I'm not using the value of a float read as an integer; the integer is being used as an integer and the float is being used as a float.

I'd still consider myself a beginner in C, so I'm guessing there are better ways to do things like this, maybe type-punning isn't necessary at all, so feel free to include alternate solutions in your answer. Thanks.

CodePudding user response：

Wikipedia is wrong; using a “char pointer” is insufficient. You must use a character type.

C 2018 6.5 7 says:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

…

— a character type.

In *(float *)(char *)&x, you do not use a character type to access x. First the address of x is converted to a char *, then it is converted to a float *, and then * is applied. Since * is applied to a float *, this accesses the object as a float. The fact the address was converted through a char * at one point is irrelevant; the access is done as a float. And that does not conform to the rule in 6.5 7.

To access the bytes of an object using a character type, you can use:

unsigned char *px = (unsigned char *) &x;
unsigned char *pf = (unsigned char *) &f;
for (size_t i = 0; i < sizeof x;   i)
    pf[i] = px[i];

Then, since px is an unsigned char *, the access px[i] is through an unsigned char type, which conforms to the rule in 6.5 7.

Since there is a standard library routine to do this copy, you can also write memcpy(&f, &x, sizeof x);.

Good compilers with optimization enabled will implement memcpy(&f, &x, sizeof x); as a single move operation rather than an actual byte-by-byte copy, circumstances permitting.

How would casting to char* and then to float* be any "safer" than simply casting to float* (or is safety not the issue)?

As explained above, it is not safer. However, if we ask “How is accessing through a character type safer than accessing an int through a float type?”, then the answer is that it conforms to the rule in 6.5 7 and puts the compiler on notice that aliasing may be occurring. Normally, if a function is passed a pointer to a float and a pointer to an int, the compiler is allowed to assume they point to different objects, and so any changes you make to things through the int pointer will not affect things used through the float pointer, and vice-versa. The compiler may optimize code based on this assumption that the two pointers do not point to the same thing. However, if you copy bytes through character lvalues, including by using memcpy, the compiler is required to allow for the possibility that those lvalues could be accessing any object (except that other rules may have additional restrictions, such as not modifying const objects).

Could this change based on the endianness of the system?

The C standard allows for float and int objects to have different endianness, but this is uncommon in C implementations. As long as they have the same endianness, copying the bytes of an int into a float or vice-versa will have the expected result (of revealing the encoding of a float object in a natural way in the bits of an int).