I write a parser with Chevrotain in Typescript.
I have an expression of this form:
TERM:MATCH_TERM AND TERM:MATCH_TERM OR TERM:MATCH_TERM AND TERM:MATCH_TERM OR TERM:MATCH_TERM
with OR taking precedence over AND, and there can be the number of AND and OR that the user wants.
I managed to write the parser that allows me to obtain a result of this form:
Array_of_expressions: [expression1, expression2, ...]
Array_of_operators: [AND, OR, AND, AND, OR, ...]
It is the following rules that allowed me to have this result:
const StringDoubleQuote = createToken({ name: "StringDoubleQuote", pattern: /"[^"\\]*(?:\\.[^"\\]*)*"/ });
const StringSimpleQuote = createToken({ name: "StringSimpleQuote", pattern: /'[^'\\]*(?:\\.[^'\\]*)*'/ });
const And = createToken({ name: "And", pattern: /(AND|and)/ });
const Or = createToken({ name: "Or", pattern: /(OR|or)/ });
const Not = createToken({ name: "Not", pattern: /(NOT|not)/ });
const Colon = createToken({ name: "Colon", pattern: /:/ });
const WhiteSpace = createToken({
name: "WhiteSpace",
pattern: /[ \t\n\r] /,
group: Lexer.SKIPPED
});
const allTokens = [
WhiteSpace,
Colon,
And,
Or,
Not,
StringDoubleQuote,
StringSimpleQuote
];
class CustomParser extends CstParser {
private static INSTANCE: CustomParser | undefined;
public static get(): CustomParser {
if (CustomParser.INSTANCE === undefined) {
CustomParser.INSTANCE = new CustomParser();
}
return CustomParser.INSTANCE;
}
public readonly jsonLexer = new Lexer(allTokens);
constructor() {
super(allTokens, { nodeLocationTracking: "onlyOffset" })
this.performSelfAnalysis()
}
// In TypeScript the parsing rules are explicitly defined as class instance properties
// This allows for using access control (public/private/protected) and more importantly "informs" the TypeScript compiler
// about the API of our Parser, so referencing an invalid rule name (this.SUBRULE(this.oopsType);)
// is now a TypeScript compilation error.
public extractFirstExpression = this.RULE(RULES.extractFirstExpression, () => {
this.SUBRULE(this.extractNextExpression);
});
public extractNextExpression = this.RULE(RULES.extractNextExpression, () => {
this.MANY(() => this.OR([
{ ALT: () => this.SUBRULE(this.extractParentOperator) },
{ ALT: () => this.SUBRULE(this.extractParentExpression) }
]));
});
public extractParentOperator = this.RULE(RULES.extractParentOperator, () => {
this.OR([
{
ALT: () => {
this.CONSUME(And, { LABEL: TERMINAL_LABELS.AND_BETWEEN_GLOBAL_TERMS });
}
},
{
ALT: () => {
this.CONSUME(Or, { LABEL: TERMINAL_LABELS.OR_BETWEEN_GLOBAL_TERMS });
}
}
])
});
public extractParentExpression = this.RULE(RULES.extractParentExpression, () => {
this.SUBRULE(this.extractGenericTerm);
this.SUBRULE(this.extractMatchTerm);
});
....
}
I would like, instead of having this result, to have a list of ORs, and each OR would contain a list of ANDs, and each AND would contain the two expressions around it.
arrayOR = [OR, OR, ...]
and each OR would be an interface like this :
interface OrNode {
type: "OrNode",
operator: "OR",
orOperatorChild: AndNode[]
}
interface AndNode {
type: "AndNode",
operator: "AND",
leftExpression: Expression,
rightExpression: Expression
}
interface Expression {
type: "Expression",
fullPart?: string,
mainTerm?: string,
valueMatch?: string,
...
}
According to me to have this result I have to make a rule that will look for the OR operators, for each OR operator enter a SUBRULE that will look for the AND operators, then for each AND operator enter a SUBRULE that will look for the right and left expression.
But my string starts with an expression first, so how can I read the OR operator before the first expression and then enter the SUBRULE that will read the rest? Also, in some cases there is only one expression without operators, how do I do if I want the OR to be parent but there is no OR?
I've been told to use recursive rules, including looking at this example example , but I don't see how I can apply it to my case.
EDIT:
I changed the previous grammar rules to the following (I tried to add recursion):
public extractExpressions = this.RULE(RULES.extractExpressions, () => {
this.SUBRULE(this.extractParentExpression);
this.OPTION(() => this.OR([
{ ALT: () => this.SUBRULE(this.extractAndExpression) },
{ ALT: () => this.SUBRULE(this.extractOrExpression) }
]));
});
public extractAndExpression = this.RULE(RULES.extractAndExpression, () => {
this.CONSUME(And, { LABEL: TERMINAL_LABELS.AND_BETWEEN_GLOBAL_TERMS });
this.SUBRULE(this.extractExpressions);
});
public extractOrExpression = this.RULE(RULES.extractOrExpression, () => {
this.CONSUME(Or, { LABEL: TERMINAL_LABELS.OR_BETWEEN_GLOBAL_TERMS });
this.SUBRULE(this.extractExpressions);
});
public extractParentExpression = this.RULE(RULES.extractParentExpression, () => {
this.SUBRULE(this.extractGenericTerm);
this.SUBRULE(this.extractMatchTerm);
});
.....
Do these rules seem coherent and optimized to you. I get an object that allows me to do what I want to do but I wonder if there is a more optimized or elegant way to do what I want to do.
Thanks in advance for any help.
CodePudding user response:
As I put it in the edit of the question, the solution to have a recursive structure in the result of my parser is to write my rules in the following way in this case:
public extractExpressions = this.RULE(RULES.extractExpressions, () => {
this.SUBRULE(this.extractParentExpression);
this.OPTION(() => this.OR([
{ ALT: () => this.SUBRULE(this.extractAndExpression) },
{ ALT: () => this.SUBRULE(this.extractOrExpression) }
]));
});
public extractAndExpression = this.RULE(RULES.extractAndExpression, () => {
this.CONSUME(And, { LABEL: TERMINAL_LABELS.AND_BETWEEN_GLOBAL_TERMS });
this.SUBRULE(this.extractExpressions);
});
public extractOrExpression = this.RULE(RULES.extractOrExpression, () => {
this.CONSUME(Or, { LABEL: TERMINAL_LABELS.OR_BETWEEN_GLOBAL_TERMS });
this.SUBRULE(this.extractExpressions);
});
public extractParentExpression = this.RULE(RULES.extractParentExpression, () => {
this.SUBRULE(this.extractGenericTerm);
this.SUBRULE(this.extractMatchTerm);
});
.....