I have a scripting language that I am implementing where every statement is terminated by a newline, except possibly the last one. It's also possible to have lines that are just newlines. Here's one implementation (statement code omitted since it doesn't really add anything):
@_('statement_set statement', 'statement_set')
def script(self, p):
if len(p) > 1:
p.statement_set.append(p.statement)
return p.statement_set
@_('statement_or_NL')
def statement_set(self, p):
return p.statement_or_NL
@_('statement_set statement_or_NL')
def statement_set(self, p):
p.statement_set.append(p.statement_or_NL)
return p.statement_set
@_('statement NEWLINE', 'NEWLINE')
def statement_or_NL(self, p):
if len(p) > 1:
return [p.statement]
else:
return []
@_('PLUS', 'MINUS')
def statement(self, p):
return p[0]
That generates the following rules:
Rule 0 S' -> script
Rule 1 script -> statement_set
Rule 2 script -> statement_set statement
Rule 3 statement_set -> statement_set statement_or_NL
Rule 4 statement_set -> statement_or_NL
Rule 5 statement_or_NL -> NEWLINE
Rule 6 statement_or_NL -> statement NEWLINE
If you'd like, I can post the generated states as well. The issue I'm running into is that, even when I eliminate all shift-reduce conflicts, I still almost always fail in one of the three scenarios of code that starts with a newline, code that ends with a newline, and code that has no newline at the end. Something goes awry in the parsing such that it will read the final statement (which ends with an EOF) as a "statement NEWLINE" and then error out because there is no new-line. Worse comes to words, I have a version that works as long as there's no final statement without a newline, so I could probably rig the parser to either add it to the initial string with the code, or to insert the token, but that feels clunky.
I got this working in Ply previously, and I thought it was essentially the same code, but it does not seem to work for me.
CodePudding user response:
Here's a fairly general solution for this sort of problem. It basically takes the statement list to be a two-state automaton, alternating between "ends-with-newline" and "ends-with-statement". The final result is a list of all statements (although you could do something more interesting with the newlines as well).
Since all newlines are ignored except for their role in separating statements, you can have any number of consecutive newlines, and newline(s) at the end of the input are optional. The simple transition diagram:
CURRENT STATE INPUT RESULT STATE
------------------- --------- -------------------
ends-with-newline NL ends-with-newline
ends-with-newline statement ends-with-statement
ends-with-statement NL ends-with-newline
ends-with-statement statement *ERROR*
Finite automaton are easily converted into CFGs using the template:
RESULT: CURRENT INPUT
That leads to the following (in which the actual definition of statement
is omitted, since the only thing that matters about it is that it cannot derive anything which starts with a statement terminator, which would be unusual to say the least, and that it must not derive empty, which occasionally requires some adjustments):
@_('script_ends_nl', 'script_ends_stmt')
def script(self, p):
return p[0]
@_('')
def script_ends_nl(self, p):
return []
@_(' script_ends_nl NL ',
' script_ends_stmt NL ')
def script_ends_nl(self, p):
return p[0]
@_('script_ends_nl statement')
def script_ends_stmt(self, p):
return p.script_ends_nl [p.statement]
Or, using Sly's debugfile, the grammar:
Rule 0 S' -> script
Rule 1 script -> script_ends_stmt
Rule 2 script -> script_ends_nl
Rule 3 script_ends_nl -> script_ends_stmt NL
Rule 4 script_ends_nl -> script_ends_nl NL
Rule 5 script_ends_nl -> <empty>
Rule 6 script_ends_stmt -> script_ends_nl statement