Home > Net >  process array in chunks using struct then cast as flat array - how to avoid UB (strict aliasing)?
process array in chunks using struct then cast as flat array - how to avoid UB (strict aliasing)?

Time:12-19

An external API expects a pointer to an array of values (int as simple example here) plus a size.

It is logically clearer to deal with the elements in groups of 4.

So process elements via a "group of 4" struct and then pass the array of those structs to the external API using a pointer cast. See code below.

Spider sense says: "strict aliasing violation" in the reinterpret_cast => possible UB?

  1. Are the static_asserts below enough to ensure: a) this works in practice b) this is actually standards compliant and not UB?

  2. Otherwise, what do I need to do, to make it "not UB". A union? How exactly please?

  3. or, is there overall a different, better way?


#include <cstddef>

void f(int*, std::size_t) {
    // external implementation
    // process array
}

int main() {

    static constexpr std::size_t group_size    = 4;
    static constexpr std::size_t number_groups = 10;
    static constexpr std::size_t total_number  = group_size * number_groups;

    static_assert(total_number % group_size == 0);

    int vals[total_number]{};

    struct quad {
        int val[group_size]{};
    };

    quad vals2[number_groups]{};
    // deal with values in groups of four using member functions of `quad`

    static_assert(alignof(int) == alignof(quad));
    static_assert(group_size * sizeof(int) == sizeof(quad));
    static_assert(sizeof(vals) == sizeof(vals2));

    f(vals, total_number);
    f(reinterpret_cast<int*>(vals2), total_number); /// is this UB? or OK under above asserts?
}

CodePudding user response:

No, this is not permitted. The relevant C standard section is §7.6.1.10. From the first paragraph, we have (emphasis mine)

The result of the expression reinterpret_­cast<T>(v) is the result of converting the expression v to type T. If T is an lvalue reference type or an rvalue reference to function type, the result is an lvalue; if T is an rvalue reference to object type, the result is an xvalue; otherwise, the result is a prvalue and the lvalue-to-rvalue, array-to-pointer, and function-to-pointer standard conversions are performed on the expression v. Conversions that can be performed explicitly using reinterpret_­cast are listed below. No other conversion can be performed explicitly using reinterpret_­cast.

So unless your use case is listed on that particular page, it's not valid. Most of the sections are not relevant to your use case, but this is the one that comes closest.

An object pointer can be explicitly converted to an object pointer of a different type.[58] When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)).

So a reinterpret_cast from one pointer type to another is equivalent to a static_cast through an appropriately cv-qualified void*. Now, a static_cast that goes from T* to S* can be acceptably used as a S* if the types T and S are pointer-interconvertible. From §6.8.4

Two objects a and b are pointer-interconvertible if:

  • they are the same object, or
  • one is a union object and the other is a non-static data member of that object ([class.union]), or
  • one is a standard-layout class object and the other is the first non-static data member of that object or any base class subobject of that object ([class.mem]), or
  • there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast ([expr.reinterpret.cast]).

[Note 4: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note]

To summarize, you can cast a pointer to a class C to a pointer to its first member (and back) if there's no vtable to stop you. You can cast a pointer to C into another pointer to C (that can come up if you're adding cv-qualifiers; for instance, reinterpret_cast<const C*>(my_c_ptr) is valid if my_c_ptr is C*). There are also some special rules for unions, which don't apply here. However, you can't factor through arrays, as per Note 4. The conversion you want here is quad[] -> quad -> int -> int[], and you can't convert between the quad[] and the quad. If quad was a simple struct that contained only an int, then you could reinterpret a quad* as an int*, but you can't do it through arrays, and certainly not through a nested layer of them.

None of the sections I've cited say anything about alignment. Or size. Or packing. Or padding. None of that matters. All your static_asserts are doing is slightly increasing the probability that the undefined behavior (which is still undefined) will happen to work on more compilers. But you're using a bandaid to repair a dam; it's not going to work.

CodePudding user response:

No amount of static_asserts is going to make something which is categorically UB into well-defined behavior in accord with the standard. You did not create an array of ints; you created a struct containing an array of ints. So that's what you have.

It's legal to convert a pointer to a quad into a pointer to an int[group_size] (though you'll need to alter your code appropriately. Or you could just access the array directly and cast that to an int*.

Regardless of how you get a pointer to the first element, it's legal to do pointer arithmetic within that array. But the moment you try to do pointer arithmetic past the boundaries of the array within that quad object, you achieve undefined behavior. Pointer arithmetic is defined based on the existence of an array: [expr.add]/4

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

  • If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
  • Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P J and J P (where J has the value j) point to the (possibly-hypothetical) array element i j of x if 0≤i j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
  • Otherwise, the behavior is undefined.

The pointer isn't null, so case 1 doesn't apply. The n above is group_size (because the array is the one within quad), so if the index is > group_size, then case 2 doesn't apply.

Therefore, undefined behavior will happen whenever someone tries to access the array past index 4. There is no cast that can wallpaper over that.


Otherwise, what do I need to do, to make it "not UB". A union? How exactly please?

You don't. What you're trying to do is simply not valid with respect to the C object model. You need an array of ints, so you must create an array of ints. You cannot treat an array of something other than ints as an array of ints (well, with minor exceptions of byte-wise arrays, but that's unhelpful to you).


The simplest valid way to process the array in groups is to just... do some nested loops:

int arr[total_number];
for(int* curr = arr; curr != std::end(arr); curr  = 4)
{
  //Use `curr[0]` to `curr[3]`;
  //Or create a `std::span<int, 4> group(curr)`;
}
  • Related