Home > Back-end >  What precisely is an expression?
What precisely is an expression?

Time:11-15

Consider whether x in the declaration int x; is an expression.

I used to think that it's certainly not, but the grammar calls the variable name an id-expression here.

One could then argue that only expression is an expression, not ??-expression. But then in 1 2, neither 1 nor 2 match, because those are additive-expression and multiplicative-expression respectively, not expressions. But common sense says those should be called expressions too.

We could decide that any ??-expression (including expression) is an expression, but then the variable name in a declaration matches as well.

We could define an expression to be any ??-expression except id-expression, but this feels rather arbitrary.

What's the right grammatical definition of an expression, and is the variable name in its declaration an expression or not?

CodePudding user response:

After looking at the links provided by @LanguageLawyer (1, 2), I'm convinced the consensus is that id-expression is a misnomer, and not always an expression (e.g. it's not an expression in a declaration).

Then, a source substring is an expression if at least one of its parents in the parse tree is called:

  • expression, or
  • *-expression1 but not id-expression

, and that parent expands exactly to this substring and nothing more.

This is the same definition @n.m. proposed, except I allow "*-expression and not id-expression" nodes as well.


1 * is a wildcard for any string.

CodePudding user response:

What does the question "is x an expression" mean?

When we talk about specific occurrences of expressions and identifiers in a specific program, we must consider its parse tree. A substring of a program is an expression if some occurrence of expression node in its parse tree expands to that substring.

Thus, x in the declaration int x; is not an expression because there is no expression in the parse tree (of any valid program that contains int x; as a declaration) that expands to this occurrence of x. There is an id-expression node, but that particular id-expression is not an expansion of an expression node, it is a part of an expansion of a declaration node.

When talking about about expressions and identifiers in isolation, then "is a" means "expands to/contracts to, according to some rule in the grammar". Thus, taken in isolation, x is an expression. This means we can construct a program where x is an expression according to the definition above.

These definitions are purely syntactic and as such are valid for any language and grammar production. However the C standard states informally that

An expression is a sequence of operators and operands that specifies a computation

and in several places talks about subexpressions as expressions on their own right. Thus the term "expression" in the standard does not coincide with the grammar element expression.

This is not an insurmountable problem however. The grammar is only a tool. The standard could have defined the grammar differently:

expression:
    expression = expression
    expression   expression
    expression * expression
    ...
    ( expression )
    id-expression

and resolve the ambiguities in English text. If we want the notion of expression to correspond closely to the grammar element, we probably should mentally consider the grammar presented this way.

Alternatively, as you suggest, we can consider an expansion of any ??-expression an "expression", but replace certain occurrences of id-expression with a new symbol. The two approaches seem equivalent.

Note: this version of the answer is a complete rewrite. The previous version was a result of a misunderstanding.

  • Related