I would like to parse various SQL literals in ANTLR. Examples of the literal would be:
DATE '2020-01-01'
DATE '1992-11-23'
DATE '2014-01-01'
Would it be better to do the 'bare minimum' at the parsing stage and just put in something like:
date_literal
: 'DATE' STRING
;
Or, should I be doing any validation within the parser as well, for example something like:
date_literal
: 'DATE' DIG DIG DIG DIG '-' DIG DIG '-' DIG DIG
If I do the latter I'll still need to validate...even if I do a longer regex, I'll need to check things like the number of days in the month, leap years, a valid date range, etc.
What is usually the preferable method to do this? As in, how much 'validation' do you want to do in your grammar and how much in the listeners where the actual programming would be done? Additionally, are there any performance differences between doing (small) validations 'within the grammar' vs doing it in listeners/followers?
CodePudding user response:
These are actually two slightly different syntaxes (the second does not specify that the date should be surrounded by '
s)
Based on your example, that may be an oversight, so I'll assume you mean both to require the '
s, and that your STRING
s are '
delimited.
It's a design choice, but a couple of factors to consider.
- If you use the more specific grammar, then, if the user input doesn't match, you'll get the default ANTLR error message (which will be "pretty good for a generated tool", but probably a bit obtuse to your user).
- As you say, you'll still have to perform further edits.
I lean toward keeping the grammar as simple as possible and doing more validation in a listener (maybe a visitor). This allows you to be as clear with your error messages as possible.
The only reason I see to not use the 'DATE' STRING
rule would be if there is some other string content that would NOT be a date_literal
, but would be some other, valid syntax in your language. It might be an invalid date literal, in which case, I'd use your simple rules and do the edit.