Say I have global variables defined in a TU such as:
extern const std::string s0{"s0"};
extern const std::string s1{"s11"};
extern const std::string s2{"s222"};
// etc...
And a function get_1
to get them depending on an index:
size_t get_1(size_t i)
{
switch (i)
{
case 0: return s0.size();
case 1: return s1.size();
case 2: return s2.size();
// etc...
}
}
And someone proposes replacing get_1
with get_2
with:
size_t get_2(size_t i)
{
return *(&s0 i);
}
- Are global variables defined next to each other in a translation unit like this guaranteed to be stored contiguously, and in the order defined?
- Ie will
&s1 == &s0 1
and&s2 == &s1 1
always be true? - Or can a compiler (does the standard allow a compiler to) place the variables
s0
higher thans1
in memory ie. swap them?
- Ie will
- Is it well defined behaviour to perform pointer arithmetic, like in
get_2
, over such variables? (that crucially aren't in the same sub-object or in an array etc., they're just globals like this)- Do rules about using relational operators on pointers from https://stackoverflow.com/a/9086675/8594193 apply to pointer arithmetic too? (Is the last comment on this answer about
std::less
and friends yielding a total order over anyvoid*
s where the normal relational operators don't relevant here too?)
- Do rules about using relational operators on pointers from https://stackoverflow.com/a/9086675/8594193 apply to pointer arithmetic too? (Is the last comment on this answer about
Edit: this is not necessarily a duplicate of/asking about variables on the stack and their layout in memory, I'm aware of that already, I was specifically asking about global variables. Although the answer turns out to be the same, the question is not.
CodePudding user response:
Pointer arithmetic on disparate objects yields undefined behavior as per [expr.add]:
4 When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
(4.1) — If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
(4.2) — Otherwise, if P points to an array element i of an array object x with n elements (9.3.4.5), the expressions P J and J P (where J has the value j) point to the (possibly-hypothetical) array element i j of x if 0 ≤ i j ≤ n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 ≤ i − j ≤ n.
(4.3) — Otherwise, the behavior is undefined.
Since s0
through s2
are not elements of an array, get_2
yields explicitly documented undefined behavior.
As far as I can tell, the standard puts no limits on the order in memory of these variables, so the compiler could order them any way it wanted, with any amount of padding or other variables between them. This is not explicitly mentioned as such, but as was pointed out to me in the comments, [expr.rel] and [expr.eq] determine that the results of relational operators in these cases are undefined/unspecified. In particular, [expr.eq] states about operators == and != that
(3.1) — If one pointer represents the address of a complete object, and another pointer represents the address one past the last element of a different complete object, the result of the comparison is unspecified.
and [expr.rel] about <, >, <=, >= that
4 The result of comparing unequal pointers to objects is defined in terms of a partial order consistent with the following rules:
(4.1) — If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
(4.2) — If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member is required to compare greater provided the two members have the same access control (11.9), neither member is a subobject of zero size, and their class is not a union.
(4.3) — Otherwise, neither pointer is required to compare greater than the other.
Again, since s0
, s1
, s2
are not part of the same array and not members of the same object, 4.3 is relevant, and the results of comparing pointers to them is unspecified. In practical terms, this means that the compiler can order them in memory in an arbitrary fashion.