Home > Mobile >  How to match any string between paranthesis that can contain paranthesis?
How to match any string between paranthesis that can contain paranthesis?

Time:05-25

I am trying to create a JavaCC parser and I am facing an issue.

I want to return everything between parentheses in my text but the string between those parentheses may contain some.

For example, I have this line : Node(new MB34(MB78, MB654) => (MB7, M9)) and I want a string equals to "new MB34(MB78, MB654) => (MB7, M9)". There is no specific pattern between parentheses.

I have tried to use lexical state according to the javacc documentation:

SKIP :
{ " " | "\t" | "\n" | "\r" | "\f" | "\r\n" }

TOKEN :
{
  < #LETTER             : ( [ "a"-"z" ] | [ "A"-"Z" ] ) >
| < #DIGIT              : [ "0"-"9" ] >
| < #ALPHA              : ( < LETTER > | < DIGIT > ) >
| < IDENTIFIER          : < LETTER > ( < ALPHA > )* >
}

TOKEN : {
  < "(" > : IN_LABEL
}

< IN_LABEL > TOKEN : {
  < TEXT_LABEL : ~[] >
}

< IN_LABEL > TOKEN : {
  < END_LABEL : ")"> : DEFAULT
}

String LABEL():
{
  Token token_label;
  String label = "";
}
{
  < IDENTIFIER >
  "(" ( token_label = < TEXT_LABEL > { label  = token_label.toString(); } )  < END_LABEL >
   {
     return label;
   }
}

However, since the string to get out of the lexical state "IN_LABEL" is the single character ")" it doesn't work, the parser matches all the text without returning to the DEFAULT state. I found a temporary solution by replacing the END_LABEL token by :

< IN_LABEL > TOKEN : {
  < END_LABEL : ~[]")"> : DEFAULT
}

But it doesn't work either because this token can match before the real end of the label.

Does anyone have a solution to this problem?

CodePudding user response:

There may be a simpler solution, but here's mine:

SKIP :
{ " " | "\t" | "\n" | "\r" | "\f" | "\r\n" }

TOKEN :
{
  < #LETTER             : ( [ "a"-"z" ] | [ "A"-"Z" ] ) >
| < #DIGIT              : [ "0"-"9" ] >
| < #ALPHA              : ( < LETTER > | < DIGIT > ) >
| < IDENTIFIER          : < LETTER > ( < ALPHA > )* >
}

TOKEN_MGR_DECLS :
{
    int parLevel;
}

MORE : {
    "(" : IN_LABEL
}

< IN_LABEL > TOKEN : {
    < TEXT_LABEL: ")" > {
        matchedToken.image = image.substring(1,image.length()-1);
    } : DEFAULT
}

< IN_LABEL > MORE : {
   <~["(", ")"]>
}

< IN_LABEL > MORE : {
    "(" {parLevel = 0;} : IN_LABEL1
}

< IN_LABEL1 > MORE : {
    "(" {  parLevel;}
}

< IN_LABEL1 > MORE : {
    ")" {
        if (0 == parLevel--) {
            SwitchTo(IN_LABEL);
        }
    }
}

< IN_LABEL1 > MORE : {
   <~["(", ")"]>
}

String LABEL():
{
  String label = "";
}
{
  < IDENTIFIER >
  label = < TEXT_LABEL >.image
   {
     return label;
   }
}
  • Related