Home > database >  Memory alignment and strict aliasing for continuous block of raw bytes
Memory alignment and strict aliasing for continuous block of raw bytes

Time:03-02

I have a question about using same continuous block of raw bytes as storage of various typed objects from the point of C standard rules. Consider we create continuous block of raw bytes, f.e.

void *data = ::operator new(100); // 100 bytes of raw data - not typed

Could then we use this memory like:

template<class T>
T* get(std::size_t bshift)
{
    return static_cast<T*>(
        reinterpret_cast<void*>(
            reinterpret_cast<unsigned char*>(data)   bshift
        )
    );
}

  1. Is it safe or UB? Why?
float &fvalue2 = *get<float>(sizeof(float));
fvalue2 = 0.02f;

  1. Is it safe or UB? Why?
float &fvalue_reserve = *get<float>(sizeof(float)*2   1);
fvalue = 0.03f;

  1. Is it safe or UB? Why?
double &dvalue = *get<double>(sizeof(float)*3   1);
dvalue = 0.04;

  1. Is it safe or UB? Why?
// assume that somehow there is no unused internal padding in struct
struct POD_struct{
    float fvalue1;                   //[0..sizeof(float)]
    float fvalue2;                   //[sizeof(float)..sizeof(float)*2]
    char reserved[sizeof(float)   1];//[sizeof(float)*2..sizeof(float)*3 1]
    double dvalue;                   //[sizeof(float)*3 1..sizeof(float)*3 1 sizeof(double)]
};

POD_struct pod_struct;
std::memcpy(&pod_struct, get<void*>(0), sizeof(POD_struct));
pod_struct.fvalue2 == 0.02f; //true ??
pod_struct.dvalue == 0.04f; //true ??

  1. Is it safe or UB? Why? is it any different from №4?
// assume that somehow there is no unused internal padding in struct
struct POD_struct{
    float fvalue1;                   //[0..sizeof(float)]
    float fvalue2;                   //[sizeof(float)..sizeof(float)*2]
    char reserved[sizeof(float)   1];//[sizeof(float)*2..sizeof(float)*3 1]
    double dvalue;                   //[sizeof(float)*3 1..sizeof(float)*3 1 sizeof(double)]
};

POD_struct pod_struct;
std::memcpy(&pod_struct, get<POD_struct*>(0), sizeof(POD_struct));
pod_struct.fvalue2 == 0.02f; //true ??
pod_struct.dvalue == 0.04f; //true ??

  1. Would it make any significant difference if we "fill" memory pointed by data pointer not from zero offset (f.e. 1, 2, 3?) and then memcpy to POD_struct object from that offset?
  2. Is it safe to assume that if we manage required size of continuous block of raw memory buffer to fit all POD members without paddings (aligned by 1 byte) then it's ok to interpret it as any POD type? Is it safe to reuse raw memory and interpret it as another POD type storage since we done using it?

CodePudding user response:

First, it is unclear from your question, but I will assume that there is no other code inbetween the individual snippets you are showing.

Snippet 1. has undefined behavior because the pointer get will return cannot actually be pointing to a float object. ::operator new does implicitly create objects and return a pointer to a suitable created object, but that object would have to be a unsigned char object part of an unsigned char array in order to give the pointer arithmetic in reinterpret_cast<unsigned char*>(data) bshift defined behavior.

However, then the return value of get<float>(sizeof(float)); would also be a pointer to an unsigned char object. Writing through a float glvalue to a unsigned char violates the aliasing rules.

This could be remedied by either using std::launder before returning the pointer from get or better by explicitly creating the object:

template<class T>
T* get(std::size_t bshift)
{
    return new(reinterpret_cast<unsigned char*>(data)   bshift) T;
}

Although this will create a new object with indeterminate value each time it is called.

std::launder would be sufficient here without creating a new object since ::operator new can implicitly create an unsigned char array which provides storage for a float object which is also implicitly-created. (Assuming all objects used in this way fit in the storage, are correctly aligned (see below), do not overlap and are implicit-lifetime types):

template<class T>
T* get(std::size_t bshift)
{
    return std::launder(reinterpret_cast<T*>(reinterpret_cast<unsigned char*>(data)   bshift));
}

However, 2. and 3. have undefined behavior even with this modification if alignof(float) != 1 (which is very likely to be true). You cannot create and start the lifetime of an object with wrong alignment either implicitly or explicitly. (Although it may technically be possible to create an object with wrong alignment explicitly without starting its lifetime.)

For 4. and 5., assuming the above weren't undefined behavior due to misalignment or out-of-bounds access and given the assumptions in the question, I think these snippets should have defined behavior. Note however that it is extremely unlikely that these requirements are satisfied.

For 6., if you offset everything you need to again take care not to go out-of-bounds of the allocation and not to violate the alignment of any of the involved types. (Including the alignment of POD_struct for 5.)

For 7. the formulation is very vague, so I am not sure what you mean. But you explicitly generally can't interpret memory as a different type than it is. In your examples 4. and 5. you are copying object representations, which is different.


To be clear again: Practically speaking your code has UB due to the alignment violations. This probably extends even to platforms that allow unaligned access, because the compiler may optimize code based on the assumption of pointer alignment. Compilers may offer type annotations to indicate that a pointer may be unaligned. You need to use these or other tools if you want to implement unaligned access in practice.

  • Related