Home > Software design >  Better to be more explicit or less explicit in a parsing grammar?
Better to be more explicit or less explicit in a parsing grammar?

Time:08-07

Let's say I have a SQL-like language that supports seeing if two expressions are equal only if they are the same type, and if they're not the same type it'll raise an error. Examples would be:

1 = 1         # true
1 = 1.2       # false
1 = '1'       # error
1 = '1'::int  # true

What would the best production for this be?

EQ:      expr '='  expr

Or something much more detailed, which would attempt to catch type errors at the lexing(parsing?) stage, such as:

EQ:      numeric_expr '=' numeric_expr
       | string_expr  '=' string_expr
       | ...etc

CodePudding user response:

SO is notoriously hostile at questions which ask for an opinion. But in this case, I think there is an objective preference, so I'll hazard offering an answer.

You cannot, in general, detect a type mismatch in a context-free grammar, for the simple reason that the type of a variable (or fieldname, or whatever) is not part of its syntax. Looking up the type of a variable is, pretty well by definition, not context-free.

Of course, you can do some very limited typechecking on constant expressions, but that's not a particularly interesting case. Most SQL queries do not involve comparing two literals values.

Trying to classify expression productions syntactically by type will almost certainly lead to grammar conflicts. But even if you somehow manage to do it, you will then have to try to construct a sensible error message, since flagging a type mismatch as "Syntax error" is highly misleading to the programmer. And producing good error messages during a parse is much harder than you might think.

By contrast, it is very easy to catch type errors by analysing the parse tree created by the parser, at least if there is some mechanism to identify the type of a named object. Furthermore, if you are using a typechecker to check for type mismatches, rather than a general-purpose parser, it is almost trivial to produce a meaningful error message.

So it's not really a question of "explicitness". It's a question of detecting errors at the correct point during program analysis.

  • Related