Home > Back-end >  Global variables in a translation unit, will they be stored contiguous and can pointer arithmetic be
Global variables in a translation unit, will they be stored contiguous and can pointer arithmetic be

Time:11-23

Say I have global variables defined in a TU such as:

extern const std::string s0{"s0"};
extern const std::string s1{"s11"};
extern const std::string s2{"s222"};
// etc...

And a function get_1 to get them depending on an index:

size_t get_1(size_t i)
{
    switch (i)
    {
        case 0: return s0.size();
        case 1: return s1.size();
        case 2: return s2.size();
        // etc...
    }
}

And someone proposes replacing get_1 with get_2 with:

size_t get_2(size_t i)
{
    return *(&s0   i);
}
  1. Are global variables defined next to each other in a translation unit like this guaranteed to be stored contiguously, and in the order defined?
    • Ie will &s1 == &s0 1 and &s2 == &s1 1 always be true?
    • Or can a compiler (does the standard allow a compiler to) place the variables s0 higher than s1 in memory ie. swap them?
  2. Is it well defined behaviour to perform pointer arithmetic, like in get_2, over such variables? (that crucially aren't in the same sub-object or in an array etc., they're just globals like this)
    • Do rules about using relational operators on pointers from https://stackoverflow.com/a/9086675/8594193 apply to pointer arithmetic too? (Is the last comment on this answer about std::less and friends yielding a total order over any void*s where the normal relational operators don't relevant here too?)

Edit: this is not necessarily a duplicate of/asking about variables on the stack and their layout in memory, I'm aware of that already, I was specifically asking about global variables. Although the answer turns out to be the same, the question is not.

CodePudding user response:

Pointer arithmetic on disparate objects yields undefined behavior as per [expr.add]:

4 When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

(4.1) — If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.

(4.2) — Otherwise, if P points to an array element i of an array object x with n elements (9.3.4.5), the expressions P J and J P (where J has the value j) point to the (possibly-hypothetical) array element i j of x if 0 ≤ i j ≤ n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 ≤ i − j ≤ n.

(4.3) — Otherwise, the behavior is undefined.

Since s0 through s2 are not elements of an array, get_2 yields explicitly documented undefined behavior.

As far as I can tell, the standard puts no limits on the order in memory of these variables, so the compiler could order them any way it wanted, with any amount of padding or other variables between them. This is not explicitly mentioned as such, but as was pointed out to me in the comments, [expr.rel] and [expr.eq] determine that the results of relational operators in these cases are undefined/unspecified. In particular, [expr.eq] states about operators == and != that

(3.1) — If one pointer represents the address of a complete object, and another pointer represents the address one past the last element of a different complete object, the result of the comparison is unspecified.

and [expr.rel] about <, >, <=, >= that

4 The result of comparing unequal pointers to objects is defined in terms of a partial order consistent with the following rules:

(4.1) — If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.

(4.2) — If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member is required to compare greater provided the two members have the same access control (11.9), neither member is a subobject of zero size, and their class is not a union.

(4.3) — Otherwise, neither pointer is required to compare greater than the other.

Again, since s0, s1, s2 are not part of the same array and not members of the same object, 4.3 is relevant, and the results of comparing pointers to them is unspecified. In practical terms, this means that the compiler can order them in memory in an arbitrary fashion.

  • Related