Home > Blockchain >  Regular expressions representing BNF
Regular expressions representing BNF

Time:11-28

I am working on a lexer. I need to identify patterns in BNF and specify them using Regular expressions. I know there are TOKENS e.g. keywords, identifiers, operators etc. So far I have defined Regular expressions:

 digit=[0-9]
    integer={digit} 
    letter=[a-zA-Z]

But the given BNF rule for Identifier is:

   < id > ::= < letter >
                | "_"
                | < id > < digit >
                | < id > < letter >

Since the <id> is defined by Recursive pattern, how can I express this pattern using Regular expression. I tried this Regular expression: id={letter} |"_"|{id}{digit} for the above BNF rule but it is giving me error that Regular expression contains cycle.

CodePudding user response:

Looking at the BNF we can see that an <id> can begin with a letter or underscore. We can also see that an <id> can be followed by either a digit or letter and it is still a valid <id>. This is implies that an <id> begins with either a letter or underscore and can be followed by 0 or more digits or letters. This suggests the following regular expression:

id = [a-zA-Z_][0-9a-zA-Z]*
  1. [a-zA-Z_] Matches a letter or '_'
  2. [0-9a-zA-Z]* Matches a digit or letter 0 or more times.

But since you already have {digit} and {letter} already defined as individual character classes, this would be using JFlex (I am not that familiar with JFlex, so I may not have the JFlex syntax exactly right):

id = ({letter}|_)({digit}|{letter})*

This would be equivalent to the regex:

([a-zA-Z]|_)([0-9]|[a-zA-Z])*
  • Related