In RFC 3986, they defined the rule:
path-empty = 0<pchar>
For simplicity, let's assume pchar
is defined:
pchar = 'a' / 'b' / 'c'
What does path-empty
match and how is it matched?
I've read the Wikipedia page on ABNF. My guess is that matches the empty string (regex ^(?![\s\S])
). If that is the case, why even reference pchar
? Is there not a simpler way to match the empty string in ABNF syntax without referencing another rule?
How could this be translated to ANTLR4?
CodePudding user response:
Yes, you are correct. path-empty
derives the empty string.
In ABNF, the right-hand side of a rule must contain an element
, which will be anything other than spaces, newlines, and comments. See rfc5234, page 10. Given this syntax, there are several ways to define the empty string. path-empty = 0<pchar>
is one way. It means "exactly zero of <pchar>
". But, path-empty = ""
and path-empty = 0pchar
would also work. ABNF does not define whether one is preferred over the other.
Note, the rfc3986 spec uses a prose-val
, i.e., <pchar>
instead of the "rulename" pchar
(or even ""
, 0pchar
, or just <empty string>
). It's unclear why, but it has the same effect--path-empty
derives the empty string. But, <pchar>
is not the same as pchar
. A prose value is a "last resort" way to add a non-formal description of the rule.
In Antlr4, the rule would just be path_empty : ;
. Note, Antlr has a different naming convention that defines a strict boundary between lexer and parser. ABNF does not have this distinction. In fact, this grammar could be converted to a single Antlr lexer grammar, an exercise in understanding the power of Antlr lexers.