Home > Mobile >  Why is it valid to access two union members at the same time in C
Why is it valid to access two union members at the same time in C

Time:09-23

From my understanding of C unions, only one member in the union is active (thus can be accessed) at any given point, and you will need a flag outside the union structure to determine which.

However, I recently came across the following code snippet from the fastjson library from document.h (the following code definitely works because fastjson is a widely used library, though it could be a non-standard use):

//! Constructor for int value.
explicit GenericValue(int i) RAPIDJSON_NOEXCEPT : data_() {
    data_.n.i64 = i;
    data_.f.flags = (i >= 0) ? (kNumberIntFlag | kUintFlag | kUint64Flag) : kNumberIntFlag;
} 

int GetInt() const { RAPIDJSON_ASSERT(data_.f.flags & kIntFlag);   return data_.n.i.i;   }

Where data_ is of union type Data:

union Data {
    String s;
    ShortString ss;
    Number n;
    ObjectData o;
    ArrayData a;
    Flag f;
};  // 16 bytes in 32-bit mode, 24 bytes in 64-bit mode, 16 bytes in 64-bit with RAPIDJSON_48BITPOINTER_OPTIMIZATION

Number n itself is a union too, while Flag f is a struct:

struct Flag {
    char payload[sizeof(SizeType) * 2   sizeof(void*)   2]; // 2 padding bytes
    uint16_t flags;
};

union Number {
    struct I {
        char padding[4];
        int i;
    }i;
    struct U {
        char padding2[4];
        unsigned u;
    }u;
    int64_t i64;
    uint64_t u64;
    double d;
};

This snippet get me confused because both data_.f and data_.n are accessed. How can this be valid? Could someone explain it to me?

CodePudding user response:

Why is it valid to access two union members at the same time in C

It isn't, in general.

[class.union] says:

In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended ([basic.life]). At most one of the non-static data members of an object of union type can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

and has an example describing exactly the actions of your fastjson library as Undefined Behaviour.

There is an exception to this rule:

One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence ([class.mem]), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see [class.mem].

which a developer could have used to make the code perfectly well-defined and portable: just put the flags at the front (of every data type), and you'll always be able to access them to find out which union member you should be using.

There is also an extension to this: the compiler can choose to support type-punning through unions, and in fact GCC does.

That doesn't mean it will work on every compiler, or every future version of GCC, and it doesn't make it good practice.

CodePudding user response:

How can this be valid?

This isn't valid in standard C . The behaviour is undefined.

From my understanding of C unions, only one member in the union is active (thus can be accessed)

This is true in most cases, but is an exception: You're allowed to access common initial sequence of an inactive standard layout struct member (common with the active member). As far as I can tell, that exception doesn't apply to the shown program.

CodePudding user response:

Your understanding of unions is wrong. There is no active member there at all. Because if it would be like that, then the flag you mention should be provided by the union structure itself.

Every member in an union shares the same memory alignment, meaning the biggest member defines the size of the union. You can access every member to override the memory in the union itself (like storage for integers and chars) but the type in the union is only the interpretation of the underyling memory.

For example if you look at the ASCII table you can find that the integer value 65 can be also interpreted like 'A'. And that is happening in the union, the memory is the same, the members only define the interpretation of the data.

Code example:

#include <iostream>

union ASCII
{
    uint8_t Byte;
    char Sign;
};

int main()
{
    ASCII ascii;
    ascii.Byte = 65;

    std::cout << ascii.Sign << std::endl;

    return 0;
}

Program returned: 0
A
  • Related