Is strict aliasing one-way?-CodePudding

I believe 6.5p7 in the C standard defines the so-called strict aliasing rule as follows.

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

a type that is the signed or unsigned type corresponding to the effective type of the object,

a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

Here's a simple example that shows GCC's optimization based on its assumption to the rule.

int IF(int *i, float *f) {
    *i = -1;
    *f = 0;
    return *i;
}

IF:
        mov     DWORD PTR [rdi], -1
        mov     eax, -1
        mov     DWORD PTR [rsi], 0x00000000
        ret

The load for return *i is omitted assuming that int and float cannot alias.

Then let's consider case 6, where it says an object could be accessed by a character type lvalue expression (char *).

int IC(int *i, char *c) {
    *i = -1;
    *c = 0;
    return *i;
}

IC:
        mov     DWORD PTR [rdi], -1
        mov     BYTE PTR [rsi], 0
        mov     eax, DWORD PTR [rdi]
        ret

Now there is a load for return *i because i and c could overlap according to the rules, and *c = 0 could change what's in *i.

Then can we also modify a char through an int *? Should the compiler care that such thing might happen?

char CI(char *c, int *i) {
    *c = -1;
    *i = 0;
    return *c;
}

CI: #GCC
        mov     BYTE PTR [rdi], -1
        mov     DWORD PTR [rsi], 0
        movzx   eax, BYTE PTR [rdi]
        ret

CI: #Clang
        mov     byte ptr [rdi], -1
        mov     dword ptr [rsi], 0
        mov     al, byte ptr [rdi]
        ret

Looking at the assembly output, both GCC and Clang seem to think a char can be modified by access through int *.

Maybe it's obvious that A and B overlapping means A overlaps B and B overlaps A. However, I found this detailed answer which emphasizes in boldface that,

Note that may_alias, like the char* aliasing rule, only goes one way: it is not guaranteed to be safe to use int32_t* to read a __m256. It might not even be safe to use float* to read a __m256. Just like it's not safe to do char buf[1024]; int *p = (int*)buf;.

Now I got really confused. The answer is also about GCC vector types, which has an may_alias attribute so it can alias similarly as a char.

At least, in the following example, GCC seems to think overlapping access can happen in both ways.

int IV(int *i, __m128i *v) {
    *i = -1;
    *v = _mm_setzero_si128();
    return *i;
}

__m128i VI(int *i, __m128i *v) {
    *v = _mm_set1_epi32(-1);
    *i = 0;
    return *v;
}

IV:
        pxor    xmm0, xmm0
        mov     DWORD PTR [rdi], -1
        movaps  XMMWORD PTR [rsi], xmm0
        mov     eax, DWORD PTR [rdi]
        ret
VI:
        pcmpeqd xmm0, xmm0
        movaps  XMMWORD PTR [rsi], xmm0
        mov     DWORD PTR [rdi], 0
        movdqa  xmm0, XMMWORD PTR [rsi]
        ret

https://godbolt.org/z/ab5EMx3bb

But am I missing something? Is strict aliasing one-way?

Additionally, after reading the current answers and comments, I thought maybe this code is not allowed by the standard.

typedef struct {int i;} S;
S s;
int *p = (int *)&s;
*p = 1;

Note that (int *)&s is different from &s.i. My current interpretation is that an object of type S is being accessed by an lvalue expression of type int, and this case is not listed in 6.5p7.

CodePudding user response：

Yes it's only one way, but from the context of the function it can't tell from which side.

Given this:

char CI(char *c, int *i) {
    *c = -1;
    *i = 0;
    return *c;
}

It could have been called like this:

int a;
char *p = ((char *)&a)   1;
char b = CI(p,&a);

Which is a valid use of aliasing. So from inside of the function, *i = 0 is correctly setting a in the calling function, and *c = -1 is correctly setting one byte inside of a.

CodePudding user response：

You can take a pointer to any object, cast it to a char* and use that to access the bit patterns underlying said object. You can also cast char* gotten this way back to it's original type.

So when the compiler sees int *i and char *p it can not exclude the possibility that p was created by casting from i. So they may point to the same raw memory. Changing one may change the other. There it goes both ways. But that is not what the text is about.

What this is about is casting from A* to char* and then to B*. The object pointed to doesn't magically become a B and accessing it through a B* is undefined behavior. Maybe one-way is the wrong word. I don't know what to name this better. But for every object there is a train with only 2 stops: A* and char* (unsigned char*, signed char*, const char*, ... and all it's variants). You can go back and forth as many times as you like but you can never change tracks and go to B*.

Does that help?

The may_alias attribute sets up another such rail system. Allowing the alias between int[4] and __m128i* because that is exactly the overlapping the compiler needs for the vectorization. But that's something you have to look up in the compilers specs.

CodePudding user response：

To understand how the "Strict Aliasing Rule" applies in any particular situation, one must define two concepts which are referenced in N1570 6.5p7 but not actually defined within the Standard:

For purposes of N1570 6.5p7, under what circumstances is a region of storage considered to contain an object of any particular type? In particular for your use case, what does it mean for something to be 'copied as an array of character type'?
What does it mean for an object to be accessed "by" an lvalue of a particular type?

There has never been a consensus as to how those concepts should be specified, thus making it impossible to for anyone to know the rules "mean". The Standard seems to be intended to unambiguously support scenarios where a region of storage is created via malloc() or other such means, then written exclusively using character types, and then accessed via one other type, or those in which storage is written exclusively using one non-character type and then read exclusively via character types, but other scenarios are a bit murkier.

More significantly, while clang and gcc support those scenarios using character types, the sets of scenarios accommodated by clang and gcc omit some corner cases where the Standard is unambiguous, but which don't fit the abstraction model used by clang and gcc. Regardless of what the rules say, programmers should expect that the -fstrict-aliasing dialects of clang and gcc do not accommodate the possibility that storage which has ever been accessed via any non-character type might be accessed by any other within its lifetime, even if storage is always read using the last type with which it was written.