i'm implementing a LL(1) parser for a project of doing a shell implementation. i'm stuck trying to resolve conflicts in my grammar :
Parsing mode: LL(1).
Grammar:
1. COMMAND_LINE -> COMPLETE_COMMAND PIPED_CMD
2. PIPED_CMD -> PIPE COMPLETE_COMMAND PIPED_CMD
3. | ε
4. COMPLETE_COMMAND -> CMD_PREFIX CMD CMD_SUFFIX
5. CMD_PREFIX -> REDIRECTION CMD_PREFIX
6. | ε
7. CMD_SUFFIX -> REDIRECTION CMD_SUFFIX
8. | CMD_ARG CMD_SUFFIX
9. | ε
10. REDIRECTION -> REDIRECTION_OP WORD
11. | ε
12. CMD -> WORD
13. CMD_ARG -> WORD CMD_ARG
14. | SINGLE_QUOTE WORD DOUBLE_QUOTE CMD_ARG
15. | DOUBLE_QUOTE WORD DOUBLE_QUOTE CMD_ARG
16. | ε
17. REDIRECTION_OP -> HERE_DOC
18. | APPEND
19. | INFILE
20. | OUTFILE
i use syntax-cli to check my grammar, and the ll(1) parser is a home made implementation, i can link my implementation of the parser if needed. the conflict detected by syntax-cli are :
PIPE | WORD | SINGLE_QUOTE | DOUBLE_QUOTE | HERE_DOC | APPEND | INFILE | OUTFILE | $ | |
---|---|---|---|---|---|---|---|---|---|
CMD_SUFFIX | 9 | 7/8 | 7/8 | 7/8 | 7/8 | 7/8 | 7/8 | 7/8 | 9 |
REDIRECTION | 11 | 11 | 11 | 11 | 10/11 | 10/11 | 10/11 | 10/11 | 11 |
CMD_ARG | 16 | 13/16 | 14/16 | 15/16 | 16 | 16 | 16 | 16 | 16 |
i've also tried this grammar :
COMMAND_LINE
: COMPLETE_COMMAND PIPED_CMD
;
PIPED_CMD
: PIPE COMPLETE_COMMAND PIPED_CMD
|
;
COMPLETE_COMMAND
: REDIRECTION CMD REDIRECTION CMD_ARG REDIRECTION
;
REDIRECTION
: REDIRECTION_OP WORD
|
;
CMD
: WORD
;
CMD_ARG
: WORD REDIRECTION CMD_ARG
| SINGLE_QUOTE WORD DOUBLE_QUOTE REDIRECTION CMD_ARG
| DOUBLE_QUOTE WORD DOUBLE_QUOTE REDIRECTION CMD_ARG
| REDIRECTION
;
REDIRECTION_OP
: HERE_DOC
| APPEND
| INFILE
| OUTFILE
;
but the parser don't work when using multiple redirections ...
CodePudding user response:
Without more specification on your behalf, can't be sure to have it all. But indeed, this grammar is ambiguous.
To build a LL(1) analyzer, you must be able to say, for any combination of symbol on the analyzer stack (symbol being either a terminal or non-terminal yet to read) and any word from the input buffer, what rule should apply.
Put yourself in the situation where you code starts with a WORD
(that is first thing that is in input buffer)
You start by trying to analyze COMMAND_LINE
If input buffer starts with WORD
, then only one rule can lead to COMMAND_LINE
, that is the rule COMPLETE_COMMAND PIPED_CMD
(anyway, whatever input, there is only this rule. Either we can apply it, or it is a syntax error. But for now, no reason to raise a syntax error, this rule is compatible with a start with WORD
).
So, now, on your stack you have COMPLETE_COMMAND PIPED_CMD
, and in input buffer, still the same WORD
.
The only possible rule for the top of the stack is COMPLETE_COMMAND -> CMD_PREFIX CMD CMD_SUFFIX
So, now, on your stack you have CMD_PREFIX CMD CMD_SUFFIX PIPED_CMD
.
And waiting in input buffer WORD
2 rules can be applied from CMD_PREFIX
:
CMD_PREFIX -> REDIRECTION CMD_PREFIX
or CMD_PREFIX -> ε
None of them can start with WORD
. So either we say that what we have here is an empty CMD_PREFIX
(followed by a CMD
starting with WORD
)
Or we can see it as a REDIRECTION
followed by an empty prefix. REDIRECTION
can be REDIRECTION -> ε
So both are possible at this point. Either we have a CMD_PREFIX(ε)
or we have a CMD_PREFIX(REDIRECTION(ε), ε)
(or even more recursions).
For the grammar to be LL(1), we should not have to go deeper to decide. From this point, with the only knowledge that next lexem is WORD
, we should be able to choose among those too. We aren't.
(In fact, even with other grammar than LL(1), we couldn't decide)