Home > Mobile >  Grammar for discard expression in PLY(EBNF) is causing binary expression to be treated as a unary ex
Grammar for discard expression in PLY(EBNF) is causing binary expression to be treated as a unary ex

Time:05-25

I am writing a parser (using PLY) for a subset of python. It includes: int, name, add, unarysub, assignments, and function_calls.

So far I have the following grammar:

# Grammar for P0:
# statements := statement 
# statement := simple_statement
# simple_statement := assignment | d_expression
# assignment := NAME ['=' expression]
# d_expression := expression
# expression := sum
# sum := sum ' ' term | term
# term := factor
# factor := ' ' factor | '-' factor | primary
# primary: primary '(' [arguments] ') | atom
# atom := INT | NAME
# arguments := args | empty
# args := arg | arg ',' args
# arg := expression

It works as intended for pretty much all the cases I need. However, I am having trouble with discard expressions(d_expression) above. Python supports discard expressions. For eg, python treats both x = 1 2 and 1 2 as valid program. The former gets an Assign node in python AST, while the latter gets an Expr node(which is discarded, hence the name discard expression).

I am trying to emulate this behavior in my parser, but when implementing this feature, my parser treats 1 - 2 as a list of two Expr nodes(Expr(Constant(value=1), Expr(UnaryOp(op=USub(), operand=Constant(value=2)). This is an incorrect behavior and it should throw a syntax error as I don't have a expression MINUS expression in my grammar. I am not sure where I am going wrong, any help is appreciated.

Below is the code implementation of my parser:

class MyParser:
    tokens = MyLexer.tokens

    precedence = (
        ('left', 'PLUS', 'MINUS'),
        ('right', 'UMINUS'),
    )

    def __init__(self):
        self.lexer = P0Lexer()
        self.parser = yacc.yacc(module=self)

    def p_module(self, p):
        '''
        module : statements
        '''
        p[0] = Module(body=p[1])
        logging.info(ast.dump(p[0]))

    def p_statements(self, p):
        '''
        statements : statement
                   | statement statements
        '''
        if len(p) == 2:
            p[0] = [p[1]]
        else:
            p[0] = [p[1]]   p[2]
        for i in p[0]:
            logging.info(ast.dump(i))

    def p_statement(self, p):
        '''
        statement : simple_statement
        '''
        p[0] = p[1]
        logging.info(ast.dump(p[0]))

    def p_simple_statement(self, p):
        '''
        simple_statement : assignment
        '''
        p[0] = p[1]
        logging.info(ast.dump(p[0]))
    
    def p_assignment(self, p):
        '''
        assignment : NAME EQUALS expression
        '''
        p[0] = Assign(targets=[Name(id=p[1], ctx=Store())], value=p[3])
        logging.info(ast.dump(p[0]))

    # def p_d_expression(self, p):
    #     '''
    #     d_expression : expression
    #     '''
    #     p[0] = Expr(value=p[1])
    #     logging.info(ast.dump(p[0]))

    def p_expression(self, p):
        '''
        expression : sum
        '''
        p[0] = p[1]
        logging.info(ast.dump(p[0]))

    def p_sum(self, p):
        '''
        sum : sum PLUS term
            | term
        '''
        if len(p) == 2:
            p[0] = p[1]
        else:
            p[0] = BinOp(left=p[1], op=Add(), right=p[3])
        logging.info(ast.dump(p[0]))

    def p_term(self, p):
        '''
        term : factor
        '''
        p[0] = p[1]
        logging.info(ast.dump(p[0]))

    def p_factor(self, p):
        '''
        factor : MINUS factor %prec UMINUS
               | primary
        '''
        if len(p) == 2:
            p[0] = p[1]
        else:
            if p[1] == ' ':
                p[0] = UnaryOp(op=UAdd(), operand=p[2])
            else:
                p[0] = UnaryOp(op=USub(), operand=p[2])
        logging.info(ast.dump(p[0]))

    def p_primary(self, p):
        '''
        primary : primary LPAREN arguments RPAREN
                | atom
        '''
        if len(p) == 2:
            p[0] = p[1]
        else:
            p[0] = Call(func=p[1], args=p[3])
        logging.info(ast.dump(p[0]))

    def p_arguments(self, p):
        '''
        arguments : args 
                 | empty
        '''
        if len(p) == 2:
            p[0] = p[1] if p[1] is not None else []
        else:
            p[0] = []
        for i in p[0]:
            logging.info(ast.dump(i))

    def p_args(self, p):
        '''
        args : arg
            | arg COMMA args
        '''
        if len(p) == 2:
            p[0] = [p[1]]
        else:
            p[0] = [p[1]]   p[3]
        for i in p[0]:
            logging.info(ast.dump(i))

    def p_arg(self, p):
        '''
        arg : expression
        '''
        p[0] = p[1]
        logging.info(ast.dump(p[0]))

    def p_atom(self, p):
        '''
        atom : INT
             | NAME
        '''
        if isinstance(p[1], int):
            p[0] = Constant(value=p[1])
        else:
            p[0] = Name(id=p[1], ctx=Load())
        logging.info(ast.dump(p[0]))

    def p_empty(self, p):
        '''
        empty :
        '''
        pass

    def p_error(self, p):
        if p:
            logging.error("Syntax error at '%s'" % p.value)
        else:
            logging.error("Syntax error at EOF")
        exit(1)

CodePudding user response:

I suspect the issue here is that

statements := statement 

is your culprit. This allows you to place any two statements side-by-side and have it be considered a statements. If that's the case, then 1 - 2 is indeed a valid statements, because, as you noted, this can be broken apart into the statements 1 and -2, each of which is a discard-expression.

It might be helpful to look at other programming languages as an example to see how to resolve this. In languages like C, C , or Java, expressions can be made into statements because there's a grammar rule to the effect of

statement := expression ;

with the ; serving as a separator. That means that if you were to find something like

1   2 2   3;

in a C/C /Java program, it would be a syntax error because while 1 2 and 2 3 are expressions, those expressions aren't statements without a semicolon after them. On the other hand, it would be legal in those languages to write

1   2; 2   3;

on one line, and there's a clear delimiter between the two expressions making up the two statements.

Now let's look at Python. Python will not allow you to write

1   2 2   3

and have it count as two expressions. Why not? Well, if you check out the Python grammar, you can see that in the way it parses a series of simple statements into a simple_stmts expression. (Python uses a parsing expression grammar rather than a CFG, so the syntax is a bit different.)

simple_stmts:
    | simple_stmt !';' NEWLINE
    | ';'.simple_stmt  [';'] NEWLINE 

In other words, a simple_stmts is either

  1. A simple_stmt followed by a newline, or
  2. A series of simple_stmts, separated by semicolons, and followed by a newline.

Each of these rules gives clear delimiters between expressions, either newlines or semicolons (or both).

So in this sense, I suspect the issue is fundamentally not one about drop expressions but rather about how you allow multiple statements to be separated from one another. Requiring some sort of delimiter - say, a newline or semicolon - after each expression used as a statement should fix this.

CodePudding user response:

Thanks to @templatetypdef. I was allowing two expression without a separator.

I solved it by introducing this production:

simple_stmts:
    | simple_stmt !';' NEWLINE  # Not needed, there for speedup
    | ';'.simple_stmt  [';'] NEWLINE 
  • Related