Home > Enterprise >  Antlr rule for Digit is not recognizing digits -
Antlr rule for Digit is not recognizing digits -

Time:02-05

I'm trying to extend an existing grammar using Antlr4. In the .g4 file beside other rules the following is defined:

Digit
 :  ZeroDigit
     | NonZeroDigit
     ;

NonZeroDigit
            :  NonZeroOctDigit
                | '8'
                | '9'
                ;

NonZeroOctDigit
               :  '1'
                   | '2'
                   | '3'
                   | '4'
                   | '5'
                   | '6'
                   | '7'
                   ;

OctDigit
        :  ZeroDigit
            | NonZeroOctDigit
            ;

ZeroDigit
         :  '0' ;


SP
  :  ( WHITESPACE )  ;

so on top of that (not only as a figure of speech) I added the following rules which are supposed to make use of these existing rules:

ttQL_Query
     : ttQL_TimeClause SP;

ttQL_TimeClause
     : FROM SP? ttQL_DateTime SP? TO SP? ttQL_DateTime; 

ttQL_DateTime
    : ttQL_Date ('T' ttQL_Time ttQL_Timezone)?;

ttQL_Timezone: 'Z' | ( ' ' | '-' ) ttQL_Hour ':' ttQL_Minute; 

ttQL_Date: ttQL_Year '-' ttQL_Month '-' ttQL_Day;
ttQL_Time: ttQL_Hour (':' ttQL_Minute (':' ttQL_Second (ttQL_Millisecond)?)?)?;

ttQL_Year: Digit Digit Digit Digit;
ttQL_Month: Digit Digit;
ttQL_Day: Digit Digit;

ttQL_Hour: Digit Digit ;
ttQL_Minute: Digit Digit ;
ttQL_Second: Digit Digit ;
ttQL_Millisecond: '.' ( Digit ) ;


FROM : ( 'F' | 'f' ) ( 'R' | 'r' ) ( 'O' | 'o' ) ( 'M' | 'm' ) ;
TO : ( 'T' | 't' ) ( 'O' | 'o' ) ;

This is supposed to be an extension of the open cypher query language (grammar can be found here: http://opencypher.org/resources/) but i dont get it to work. Its supposed to prefix a cypher query. The rule for that is simple:

ttQL
     : SP? ttQL_Query SP? oC_Cypher ;

So all the other existing rules as well as the one i stated in the beginning are used in oC_Cypher. I put all my rules on top of the antlr file and when trying to parse a query like the following:

FROM 2123-12-13T12:34:39Z TO 2123-12-13T14:34:39.2222Z MATCH (a)-[x]->(b) WHERE a.ping > 22" RETURN a.ping, b"

I get the following error messages by my parser:

line 1:5 mismatched input '2123' expecting Digit
line 1:10 mismatched input '12' expecting Digit
line 1:13 mismatched input '13' expecting Digit
line 1:29 mismatched input '2123' expecting Digit
line 1:34 mismatched input '12' expecting Digit
line 1:37 mismatched input '13' expecting Digit

The weird thing is, when i put my part of the grammar in a new .g4 file and create a parser only for the prefix part FROM 2123-12-13T12:34:39Z TO 2123-12-13T14:34:39.2222Z then everything works like a charm. I'm kind of lost here. I am using vscode, java, maven and the ANTLR4 Plugin with ANTLR version 4.9.2, mvn-compiler-plugin 3.10.1, java version 11

what could be the catch here ?

CodePudding user response:

I suggest adding a fragment prefix to all lexer rules with the exception of Digits and SP.

CodePudding user response:

With the help of the answers of kaby I could solve the problem for me. I don't know if this is the correct of handling this issue but for what I want to achieve it is sufficient. So please be careful with this solution if you have a similar problem and try to solve it.

As kaby noted the lexer seaches for the Token it can concatenate the most characters with, so i just made lexer rules out of the date and time so the numbers wouldnt get recognized as Integers. Here is my working solution:

ttQL_Query
     : ttQL_TimeClause SP?;

ttQL_TimeClause
     : FROM SP? DATETIME SP? TO SP? DATETIME; 

DATETIME:  DATE ('T' TIME TIMEZONE)?;

TIMEZONE: 'Z' | ( ' ' | '-' ) Digit Digit ':' Digit Digit; 

DATE: Digit Digit Digit Digit '-' Digit Digit '-' Digit Digit;
TIME: Digit Digit (':' Digit Digit (':' Digit Digit ('.' (Digit)  )?)?)?;


FROM : ( 'F' | 'f' ) ( 'R' | 'r' ) ( 'O' | 'o' ) ( 'M' | 'm' ) ;
TO : ( 'T' | 't' ) ( 'O' | 'o' ) ;
  • Related