Home > Mobile >  How to parse dot operator in language syntax?
How to parse dot operator in language syntax?

Time:02-16

Let's say I'm writing a parser that parses the following syntax:

foo.bar().baz = 5;

The grammar rules look something like this:

program: one or more statement
statement: expression followed by ";"
expression: one of:
  - identifier (\w )
  - number (\d )
  - func call: expression "(" ")"
  - dot operator: expression "." identifier

Two expressions have a problem, the func call and the dot operator. This is because the expressions are recursive and look for another expression at the start, causing a stack overflow. I will focus on the dot operator for this quesition.

We face a similar problem with the plus operator. However, rather than using an expression you would do something like this to solve it (look for a "term" instead):

add operation: term " " term
term: one of:
  - number (\d )
  - "(" expression ")"

The term then includes everything except the add operation itself. To ensure that multiple plus operators can be chained together without using parenthesis, one would rather do:

add operation: term, one or more of (" " followed by term)

I was thinking a similar solution could for for the dot operator or for function calls.

However, the dot operator works a little differently. We always evaluate from left-to-right and need to allow full expressions so that you can do function calls etc. in-between. With parenthesis, an example might be:

(foo.bar()).baz = 5;

Unfortunately, I do not want to require parenthesis. This would end up being the case if following the method used for the plus operator.

How could I go about implementing this?

Currently my parser never peeks ahead, but even if I do look ahead, it still seems tricky to accomplish.

CodePudding user response:

The easy solution would be to use a bottom-up parser which doesn't drop into a bottomless pit on left recursion, but I suppose you have already rejected that solution.

I don't understand your objection to using a looping construct, though. Postfix modifiers like field lookup and function call are not really different from binary operators like addition (except, of course, for the fact that they will not need to claim an eventual right operand). Plus and minus intermingle freely, which you can parse with a repetition like:

additive: term ( ' ' term | '-' term )*

Similarly, postfix modifiers can be easily parsed with something like:

postfixed: atom ( '.' ID | '(' opt-expr-list `)` )*

I'm using a form of extended BNF: parentheses group; | separates alternatives and binds less stringly than concatenation; and * means "zero or more repetitions" of the atom on its left.

Another postfix operator which falls into the same category is array/map subscripting ('[' expr ']'), although you might also have other postfix operators.

Note that like the additive syntax above, selecting the appropriate alternative does not require looking beyond the next token. It's hard to parse without being able to peek one token into the future. Fortunately, that's very little overhead.

CodePudding user response:

One way could be for the dot operator to parse a non-dot expression, that is, a rule that is the same as expression but without the dot operator. This prevents recursion.

Then, when the non-dot expression has been parsed, check if a dot and an identifier follows. If this is not the case, we are done. If this is the case, wrap the current node up in a dot operation node. Then, keep track of the entire string text that has been parsed for this operation so far. Then revert everything back to before the operation was being parsed, and now re-parse a "custom expression", where the first directly-nested expression would really be trying to match the exact string that was parsed before rather than a real expression. Repeat until there are no more dot-identifier pairs (this should happen automatically by the new "custom expression").

This is messy, complicated and possibly slow, and I'm not entirely sure if it'll work but I'll try it out. I'd appreciate alternative solutions.

  • Related