I believe 6.5p7 in the C standard defines the so-called strict aliasing rule as follows.
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
Here's a simple example that shows GCC's optimization based on its assumption to the rule.
int IF(int *i, float *f) {
*i = -1;
*f = 0;
return *i;
}
IF:
mov DWORD PTR [rdi], -1
mov eax, -1
mov DWORD PTR [rsi], 0x00000000
ret
The load for return *i
is omitted assuming that int
and float
cannot alias.
Then let's consider case 6, where it says an object could be accessed by a character type lvalue expression (char *
).
int IC(int *i, char *c) {
*i = -1;
*c = 0;
return *i;
}
IC:
mov DWORD PTR [rdi], -1
mov BYTE PTR [rsi], 0
mov eax, DWORD PTR [rdi]
ret
Now there is a load for return *i
because i
and c
could overlap according to the rules, and *c = 0
could change what's in *i
.
Then can we also modify a char
through an int *
? Should the compiler care that such thing might happen?
char CI(char *c, int *i) {
*c = -1;
*i = 0;
return *c;
}
CI: #GCC
mov BYTE PTR [rdi], -1
mov DWORD PTR [rsi], 0
movzx eax, BYTE PTR [rdi]
ret
CI: #Clang
mov byte ptr [rdi], -1
mov dword ptr [rsi], 0
mov al, byte ptr [rdi]
ret
Looking at the assembly output, both GCC and Clang seem to think a char
can be modified by access through int *
.
Maybe it's obvious that A
and B
overlapping means A
overlaps B
and B
overlaps A
. However, I found this detailed answer which emphasizes in boldface that,
Note that
may_alias
, like thechar*
aliasing rule, only goes one way: it is not guaranteed to be safe to useint32_t*
to read a__m256
. It might not even be safe to usefloat*
to read a__m256
. Just like it's not safe to dochar buf[1024]; int *p = (int*)buf;
.
Now I got really confused. The answer is also about GCC vector types, which has an may_alias
attribute so it can alias similarly as a char
.
At least, in the following example, GCC seems to think overlapping access can happen in both ways.
int IV(int *i, __m128i *v) {
*i = -1;
*v = _mm_setzero_si128();
return *i;
}
__m128i VI(int *i, __m128i *v) {
*v = _mm_set1_epi32(-1);
*i = 0;
return *v;
}
IV:
pxor xmm0, xmm0
mov DWORD PTR [rdi], -1
movaps XMMWORD PTR [rsi], xmm0
mov eax, DWORD PTR [rdi]
ret
VI:
pcmpeqd xmm0, xmm0
movaps XMMWORD PTR [rsi], xmm0
mov DWORD PTR [rdi], 0
movdqa xmm0, XMMWORD PTR [rsi]
ret
https://godbolt.org/z/ab5EMx3bb
But am I missing something? Is strict aliasing one-way?
Additionally, after reading the current answers and comments, I thought maybe this code is not allowed by the standard.
typedef struct {int i;} S;
S s;
int *p = (int *)&s;
*p = 1;
Note that (int *)&s
is different from &s.i
. My current interpretation is that an object of type S
is being accessed by an lvalue expression of type int
, and this case is not listed in 6.5p7.
CodePudding user response:
Yes it's only one way, but from the context of the function it can't tell from which side.
Given this:
char CI(char *c, int *i) {
*c = -1;
*i = 0;
return *c;
}
It could have been called like this:
int a;
char *p = ((char *)&a) 1;
char b = CI(p,&a);
Which is a valid use of aliasing. So from inside of the function, *i = 0
is correctly setting a
in the calling function, and *c = -1
is correctly setting one byte inside of a
.
CodePudding user response:
You can take a pointer to any object, cast it to a char*
and use that to access the bit patterns underlying said object. You can also cast char*
gotten this way back to it's original type.
So when the compiler sees int *i
and char *p
it can not exclude the possibility that p
was created by casting from i
. So they may point to the same raw memory. Changing one may change the other. There it goes both ways. But that is not what the text is about.
What this is about is casting from A*
to char*
and then to B*
. The object pointed to doesn't magically become a B
and accessing it through a B*
is undefined behavior. Maybe one-way is the wrong word. I don't know what to name this better. But for every object there is a train with only 2 stops: A*
and char*
(unsigned char*
, signed char*
, const char*
, ... and all it's variants). You can go back and forth as many times as you like but you can never change tracks and go to B*
.
Does that help?
The may_alias attribute sets up another such rail system. Allowing the alias between int[4]
and __m128i*
because that is exactly the overlapping the compiler needs for the vectorization. But that's something you have to look up in the compilers specs.
CodePudding user response:
To understand how the "Strict Aliasing Rule" applies in any particular situation, one must define two concepts which are referenced in N1570 6.5p7 but not actually defined within the Standard:
For purposes of N1570 6.5p7, under what circumstances is a region of storage considered to contain an object of any particular type? In particular for your use case, what does it mean for something to be 'copied as an array of character type'?
What does it mean for an object to be accessed "by" an lvalue of a particular type?
There has never been a consensus as to how those concepts should be specified, thus making it impossible to for anyone to know the rules "mean". The Standard seems to be intended to unambiguously support scenarios where a region of storage is created via malloc() or other such means, then written exclusively using character types, and then accessed via one other type, or those in which storage is written exclusively using one non-character type and then read exclusively via character types, but other scenarios are a bit murkier.
More significantly, while clang and gcc support those scenarios using character types, the sets of scenarios accommodated by clang and gcc omit some corner cases where the Standard is unambiguous, but which don't fit the abstraction model used by clang and gcc. Regardless of what the rules say, programmers should expect that the -fstrict-aliasing
dialects of clang and gcc do not accommodate the possibility that storage which has ever been accessed via any non-character type might be accessed by any other within its lifetime, even if storage is always read using the last type with which it was written.