I'm trying to extend an existing grammar using Antlr4. In the .g4 file beside other rules the following is defined:
Digit
: ZeroDigit
| NonZeroDigit
;
NonZeroDigit
: NonZeroOctDigit
| '8'
| '9'
;
NonZeroOctDigit
: '1'
| '2'
| '3'
| '4'
| '5'
| '6'
| '7'
;
OctDigit
: ZeroDigit
| NonZeroOctDigit
;
ZeroDigit
: '0' ;
SP
: ( WHITESPACE ) ;
so on top of that (not only as a figure of speech) I added the following rules which are supposed to make use of these existing rules:
ttQL_Query
: ttQL_TimeClause SP;
ttQL_TimeClause
: FROM SP? ttQL_DateTime SP? TO SP? ttQL_DateTime;
ttQL_DateTime
: ttQL_Date ('T' ttQL_Time ttQL_Timezone)?;
ttQL_Timezone: 'Z' | ( ' ' | '-' ) ttQL_Hour ':' ttQL_Minute;
ttQL_Date: ttQL_Year '-' ttQL_Month '-' ttQL_Day;
ttQL_Time: ttQL_Hour (':' ttQL_Minute (':' ttQL_Second (ttQL_Millisecond)?)?)?;
ttQL_Year: Digit Digit Digit Digit;
ttQL_Month: Digit Digit;
ttQL_Day: Digit Digit;
ttQL_Hour: Digit Digit ;
ttQL_Minute: Digit Digit ;
ttQL_Second: Digit Digit ;
ttQL_Millisecond: '.' ( Digit ) ;
FROM : ( 'F' | 'f' ) ( 'R' | 'r' ) ( 'O' | 'o' ) ( 'M' | 'm' ) ;
TO : ( 'T' | 't' ) ( 'O' | 'o' ) ;
This is supposed to be an extension of the open cypher query language (grammar can be found here: http://opencypher.org/resources/) but i dont get it to work. Its supposed to prefix a cypher query. The rule for that is simple:
ttQL
: SP? ttQL_Query SP? oC_Cypher ;
So all the other existing rules as well as the one i stated in the beginning are used in oC_Cypher. I put all my rules on top of the antlr file and when trying to parse a query like the following:
FROM 2123-12-13T12:34:39Z TO 2123-12-13T14:34:39.2222Z MATCH (a)-[x]->(b) WHERE a.ping > 22" RETURN a.ping, b"
I get the following error messages by my parser:
line 1:5 mismatched input '2123' expecting Digit
line 1:10 mismatched input '12' expecting Digit
line 1:13 mismatched input '13' expecting Digit
line 1:29 mismatched input '2123' expecting Digit
line 1:34 mismatched input '12' expecting Digit
line 1:37 mismatched input '13' expecting Digit
The weird thing is, when i put my part of the grammar in a new .g4 file and create a parser only for the prefix part FROM 2123-12-13T12:34:39Z TO 2123-12-13T14:34:39.2222Z
then everything works like a charm. I'm kind of lost here. I am using vscode, java, maven and the ANTLR4 Plugin with ANTLR version 4.9.2, mvn-compiler-plugin 3.10.1, java version 11
what could be the catch here ?
CodePudding user response:
I suggest adding a fragment
prefix to all lexer rules with the exception of Digits
and SP.
CodePudding user response:
With the help of the answers of kaby I could solve the problem for me. I don't know if this is the correct of handling this issue but for what I want to achieve it is sufficient. So please be careful with this solution if you have a similar problem and try to solve it.
As kaby noted the lexer seaches for the Token it can concatenate the most characters with, so i just made lexer rules out of the date and time so the numbers wouldnt get recognized as Integers. Here is my working solution:
ttQL_Query
: ttQL_TimeClause SP?;
ttQL_TimeClause
: FROM SP? DATETIME SP? TO SP? DATETIME;
DATETIME: DATE ('T' TIME TIMEZONE)?;
TIMEZONE: 'Z' | ( ' ' | '-' ) Digit Digit ':' Digit Digit;
DATE: Digit Digit Digit Digit '-' Digit Digit '-' Digit Digit;
TIME: Digit Digit (':' Digit Digit (':' Digit Digit ('.' (Digit) )?)?)?;
FROM : ( 'F' | 'f' ) ( 'R' | 'r' ) ( 'O' | 'o' ) ( 'M' | 'm' ) ;
TO : ( 'T' | 't' ) ( 'O' | 'o' ) ;