Home > database >  Are a[i]=y ; and a[i ]=y; undefined behavior or unspecified in C language?
Are a[i]=y ; and a[i ]=y; undefined behavior or unspecified in C language?

Time:01-13

When I was looking for the expression v[i ]=i; why it is to define the behavior, I suddenly saw an explanation because the expression exists between two sequence points in the program, and the c standard stipulates that in the two sequence points The order of occurrence of the side effects is uncertain, so when the expression is run in the program, it is not sure whether the operator is operated first or the = operator is operated first. I am puzzled by this. When the expression is evaluated In the process, shouldn't the priority be used to judge first, and then the sequence point should be introduced to judge which sub-expression is executed first? Am I missing something?

When user AnT stands with Russia explained it like this, does it mean that writing in the code such as a[i]=y ; or a[i ]=y; in the program can not be sure operator and = operator can not determine who runs first.

CodePudding user response:

The reason v[i ]=i; is undefined behavior is because the variable i is both read and written in the same expression without sequencing.

Expressions such as a[i]=y and a[i ]=y do not exhibit undefined behavior because no variable is both read and written in the expression without sequencing.

The = operator does however ensure that both of its operands are fully evaluated before the side effect of assigning to the left side. Specifically, a[i] is evaluated to be an lvalue designating the ith element of the array a, and y is evaluated to be the current value of y.

CodePudding user response:

The specific rule in the C standard is C 2018 6.5 2:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.

The first sentence is the critical one here. First, consider v[i] = i ;. Here, the i in v[i] computes the value of i, and the i both computes the value of i and increments the stored value of i. Computing the value of i is a value computation of i. Incrementing the stored value of i is a side effect. To determine whether the behavior of v[i] = i ; is undefined, we ask whether the side effect is unsequenced relative to any other side effect on i or to a value computation on i.

There is no other side effect on i, so it is not unsequenced relative to any other side effect.

There is a value computation in i , but the side effect and this value computation are sequenced by the specification of the postfix operator. C 2018 6.5.2.4 2 says:

… The value computation of the result is sequenced before the side effect of updating the stored value of the operand…

So we know the computation of the value of i in i is sequenced before the side effect of incrementing the stored value.

Now we consider the value computation of the i in v[i]. The specification does not tell us about this, so let’s consider the assignment operator, =. The specification of assignment does say something about sequencing, in C 2018 6.5.16 3:

… The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.

The first sentence tells us the update of v[i] is sequenced after the value computations of the left and right operands. But it does not tell us anything about the side effect in relative to the value computation of i in v[i].

Therefore, the value computation of i in v[i] is unsequenced relative to the side effect on i in i , so the behavior of the statement is not defined by the C standard.

In a[i] = y ; we have:

  • A value computation on i in a[i].
  • A value computation on y in y .
  • An update of the stored value of y in y .
  • A value computation on a in a[i].
  • An update of the stored value of a[i] in a[i] = ….

The only object that is updated twice or that is both updated and evaluated is y, and we know from above that the value computation on y in y is sequenced before the update of y. So this statement does not contain any side effect that is unsequenced relative to another side effect or value cmputation on the same object. So its behavior is not undefined by the rule in C 2018 6.5 2.

Similarly, in a[i ] = y;, we have:

  • A value computation on i in a[i ].
  • An update of the stored value of i in i .
  • A value computation on y.
  • A value computation on a in a[i].
  • An update of the stored value of a[i] in a[i ] = ….

Again, there is only one object with two operations on it, and those operations are sequenced. The behavior is not undefined by the rule in C 2018 6.5 2.

Note

In the above, we assume neither a nor v is a pointer such that a[i] or v[i] would be i or y. If instead we consider this code:

int y = 3;
int *a = &y;
int i = 0;
a[i] = y  ;

Then the behavior is undefined because a[i] is y, so the code updates y twice, once for the assignment a[i] = … and once for y , and these updates are unsequenced. The specification of assignment says the update to the left operand is sequenced after the value computation of the result (which is the value of the right side of the assignment), but the increment for is a side effect, not part of the value computation. So the two updates are unsequenced, and the behavior is not defined by the C standard.

CodePudding user response:

An attempt to explain the "standardese" terms plainly:

The standard says (C17 6.5) that in an expression, a side effect of a variable may not occur in an unsequenced order in relation to a value computation of that same object.

To make sense of these strange terms:

  • Side effect = writing to a variable or perform a read or write access to a volatile variable.
  • Value computation = reading the value from memory.
  • Unsequenced = The order between accesses/evaulations is not specified nor well-defined. C has the concept of sequence points, which are certain points in the program that when reached, previous side effects must have been evaluated. For example, a ; introduces a sequence point. Two parts of an expression are unsequenced in relation to each other when the order of evaluation of each part is not well-defined before the next sequence point. (A complete list of all sequence points can be found in C17 Annex C.)

So when translated from standardese to English, v[i ]=i; has undefined behavior since i is written to in an unspecified order related to the other read of i in the same expression. How do we know that?

  • The assignment operator = says that (6.5.16) "the evaluations of the operands are unsequenced", refering to the left and right operands of =.
  • The postfix operator says that (6.5.2.4) "As a side effect, the value of the operand object is incremented" and "The value computation of the result is sequenced before the side effect of updating the stored value of the operand". In practice meaning that i is first read and the is applied later, though before the next sequence point, in this case the ;.

In case of a[i]=y ; or a[i ]=y; everything happens on different variables. There are two side effects, updating i (or y) and updating a[i] but they are done on different objects, so both examples are well-defined.

CodePudding user response:

The C standard (C11 draft) says the following about the postfix operator:

(6.5.2.4.2) The result of the postfix operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). [...]

A sequence point is defined by a point in the code where it is guaranteed that all side effects before the point have taken effect and no side effects after the point have taken effect.

There is no intermediate sequence points in the expression v[i ] = i;. Thus it is not defined whether the side effect of the expression i (incrementing i) takes effect before or after the right-hand side i is evaluated. Thus it is the value of the right-hand side i which is not defined in this expression.

This problem does not exist in the expression a[i ] = y; because the value of the right-hand side y is not affected by the side effect of i .

CodePudding user response:

When the expression is evaluated In the process

Which expression?

v[i  ]=i;

is a statement. It consists of a toplevel assignment expression a = b, where a and b are both themselves expressions.

The left-hand expression a is itself of the form c[d], where d is another subexpression of the form d and d is yet another expression, finally resolved to i.

If it helps we can write the whole thing out in pseudo-function-call style, like

assign(array_index(v, increment_and_return_old_value(i)), i);

Now, the problem is that the standard doesn't tell us whether the final value parameter i is obtained before or after i is mutated by increment_and_return_old_value(i) (or by i ).

... and then the sequence point should be introduced to judge which sub-expression is executed first?

The , in a function call parameter list isn't a sequence point. The relative order in which function parameters are evaluated is not defined (only that they must all have been evaluated before the function body is entered).

The same logic applies to the original code - the standard says there is no sequence point, so there is no sequence point.


does it mean that writing in the code such as a[i]=y ; or a[i ]=y; in the program can not be sure operator and = operator can not determine who runs first.

It's not the assignment that is the problem, it is evaluating the right-hand operand to be assigned.

And, in these cases, there is no relationship between left-hand side thing being assigned to and the right-hand side value being assigned. So although we still cannot be sure which is evaluated first, it doesn't matter.

If I wrote out explicitly

int *lhs = &a[i];
int rhs = y  ;
*lhs = rhs;

then reversing the first two lines would make no difference. Their relative order doesn't matter, so the lack of a defined relative order doesn't matter.

Conversely, for completeness,

int *lhs = v[i  ];
int rhs = i;
*lhs = rhs;

is the original case where the order of the first two lines does matter, and the fact that it is unspecified is a problem.

CodePudding user response:

You are correct that the order of execution of the side effects of expressions between sequence points is unspecified in C standard.

In the expression v[i ]=i;, the value of i is used for both the array index and the value being assigned to the array element. The expression involves two side effects: the incrementing of i and the assignment to the array element. The order in which these side effects occur is unspecified, so it is not guaranteed that the increment of i will happen before or after the assignment.

In C, the order of evaluation of sub-expressions is specified by operator precedence and associativity. The operator and the = operator have different precedence, so the expression is evaluated according to the order of precedence. However, the order of evaluation of the side effects within a sub-expression is not specified by the operator precedence.

The C standard defines sequence points to specify the order of evaluation of side effects. A sequence point is a point in the program where all previous evaluations and side effects are guaranteed to be completed and no subsequent evaluations or side effects are guaranteed to have begun. The operator and the = operator are both evaluated between sequence points, so the order of their side effects is unspecified.

It is important to keep in mind that the unspecified order of side effects may cause different behavior in different implementations or with different optimization settings. Therefore, it is generally recommended to avoid such expressions in the code.

  • Related