I am trying to create a JavaCC parser and I am facing an issue.
I want to return everything between parentheses in my text but the string between those parentheses may contain some.
For example, I have this line :
Node(new MB34(MB78, MB654) => (MB7, M9))
and I want a string equals to "new MB34(MB78, MB654) => (MB7, M9)"
. There is no specific pattern between parentheses.
I have tried to use lexical state according to the javacc documentation:
SKIP :
{ " " | "\t" | "\n" | "\r" | "\f" | "\r\n" }
TOKEN :
{
< #LETTER : ( [ "a"-"z" ] | [ "A"-"Z" ] ) >
| < #DIGIT : [ "0"-"9" ] >
| < #ALPHA : ( < LETTER > | < DIGIT > ) >
| < IDENTIFIER : < LETTER > ( < ALPHA > )* >
}
TOKEN : {
< "(" > : IN_LABEL
}
< IN_LABEL > TOKEN : {
< TEXT_LABEL : ~[] >
}
< IN_LABEL > TOKEN : {
< END_LABEL : ")"> : DEFAULT
}
String LABEL():
{
Token token_label;
String label = "";
}
{
< IDENTIFIER >
"(" ( token_label = < TEXT_LABEL > { label = token_label.toString(); } ) < END_LABEL >
{
return label;
}
}
However, since the string to get out of the lexical state "IN_LABEL" is the single character ")" it doesn't work, the parser matches all the text without returning to the DEFAULT state. I found a temporary solution by replacing the END_LABEL token by :
< IN_LABEL > TOKEN : {
< END_LABEL : ~[]")"> : DEFAULT
}
But it doesn't work either because this token can match before the real end of the label.
Does anyone have a solution to this problem?
CodePudding user response:
There may be a simpler solution, but here's mine:
SKIP :
{ " " | "\t" | "\n" | "\r" | "\f" | "\r\n" }
TOKEN :
{
< #LETTER : ( [ "a"-"z" ] | [ "A"-"Z" ] ) >
| < #DIGIT : [ "0"-"9" ] >
| < #ALPHA : ( < LETTER > | < DIGIT > ) >
| < IDENTIFIER : < LETTER > ( < ALPHA > )* >
}
TOKEN_MGR_DECLS :
{
int parLevel;
}
MORE : {
"(" : IN_LABEL
}
< IN_LABEL > TOKEN : {
< TEXT_LABEL: ")" > {
matchedToken.image = image.substring(1,image.length()-1);
} : DEFAULT
}
< IN_LABEL > MORE : {
<~["(", ")"]>
}
< IN_LABEL > MORE : {
"(" {parLevel = 0;} : IN_LABEL1
}
< IN_LABEL1 > MORE : {
"(" { parLevel;}
}
< IN_LABEL1 > MORE : {
")" {
if (0 == parLevel--) {
SwitchTo(IN_LABEL);
}
}
}
< IN_LABEL1 > MORE : {
<~["(", ")"]>
}
String LABEL():
{
String label = "";
}
{
< IDENTIFIER >
label = < TEXT_LABEL >.image
{
return label;
}
}