I am working on a lexer. I need to identify patterns in BNF and specify them using Regular expressions. I know there are TOKENS e.g. keywords, identifiers, operators etc. So far I have defined Regular expressions:
digit=[0-9]
integer={digit}
letter=[a-zA-Z]
But the given BNF rule for Identifier is:
< id > ::= < letter >
| "_"
| < id > < digit >
| < id > < letter >
Since the <id> is defined by Recursive pattern, how can I express this pattern using Regular expression.
I tried this Regular expression: id={letter} |"_"|{id}{digit}
for the above BNF rule but it is giving me error that Regular expression contains cycle.
CodePudding user response:
Looking at the BNF we can see that an <id>
can begin with a letter or underscore. We can also see that an <id>
can be followed by either a digit or letter and it is still a valid <id>
. This is implies that an <id>
begins with either a letter or underscore and can be followed by 0 or more digits or letters. This suggests the following regular expression:
id = [a-zA-Z_][0-9a-zA-Z]*
[a-zA-Z_]
Matches a letter or '_'[0-9a-zA-Z]*
Matches a digit or letter 0 or more times.
But since you already have {digit} and {letter} already defined as individual character classes, this would be using JFlex (I am not that familiar with JFlex, so I may not have the JFlex syntax exactly right):
id = ({letter}|_)({digit}|{letter})*
This would be equivalent to the regex:
([a-zA-Z]|_)([0-9]|[a-zA-Z])*