I am trying to write a grammar that would allow strings to be cast to a specific type. Here is what I have thus far as a working model:
grammar Test;
root
: values EOF
;
values
: value (NEWLINE value)* NEWLINE*
;
value
: string CAST_OPERATOR type
;
string
: S_QUOTE WORD S_QUOTE
;
type
: 'date' | 'string'
;
WORD
: [a-zA-Z0-9-]
;
CAST_OPERATOR
: '::'
;
NEWLINE
: '\n'
;
S_QUOTE
: '\''
;
# input.txt
'2014-01-01'::date
'hello'::string
Notice that in the above grammar, the type is one of two options -- a date
or a string
. However, in the application, I would also like to allow the user to create a new type in a separate statement. For example, they can do something like:
CREATE TYPE <name> AS <...whatever it is>
And so perhaps a user can create a type called percent
which accepts a value between 0-100 and does some stuff with it, allowing the following valid input now:
'82'::percent
It is going to be very rare for someone to be able to enter in a common type, and it has to be executed as a common statement. What is the normal way to treat this in antlr? For example, would:
The
type
section be re-compiled every time a newtype
is added? Perhaps having something like this, conceptually:type : 'date' | 'string' | <#insert-custom-types-here> ;
Use a generic 'identifier' and then do the validation out of the grammar? I feel like this might be the most extensible version, but there's about a 0.1% change the user has added their own type and about a 0.0001% that a user has added in ten or more types. So it's something that seems like it might be handled best in-line in the grammar if that's possible (in other words, it's not like an actual programming language where someone could define variables at-will, perhaps even having 1000s of variables in a program).
Something else?
CodePudding user response:
I don't think you're going to find it at all practical to modify and recompile the grammar when a new type is added.
It's pretty common for developers writing grammars to want to capture "all the things" in the grammar (or, at least as "many things" as possible). In my experience, this usually comes back to bite you. Generally, it seems to work better if you have the minimal grammar to unambiguously recognize what would be the right way to interest the input. This will yield a parseTree, and you can then write your own code in visitors
/listeners
to do the additional validation.
You can look at your concern similarly to how you'd handle the definition of variables. Clearly you don't modify the grammar when a developer adds a new variable. You'll capture that and then in your semantic validation, you'll check for whether the variable is defined (or implicitly define it, depending upon the language)
This allows you to write better error messages than you'd get with the standard ANTLR error messages as well.
Using the generic 'identifier' would definitely be the way to go.