Home > Net >  Why is parsing of aaaaabb not successful assuming the given grammar?
Why is parsing of aaaaabb not successful assuming the given grammar?

Time:10-13


Suppose grammar G has the following productions …

  1. S → aSb | A | B
  2. A → λ | a | aa | aaa
  3. B → bB | b

Using grammar G, a derivation of the string aaaaabb is …

S ⇒ aSb ⇒ aaSbb ⇒ aaAbb ⇒ aaaaabb


In the Raku grammar shown below, why is parsing of the string aaaaabb not successful? Is the derivation shown above not applicable?


# Goal: language generated by grammar = { a^n b^m : n <= m   3 }
# Example string is:
#   aaaaabb = a^5 b^2, where 5 <= 2   3
# Derivation of string aaaaabb is:
#   S => aSb => aaSbb => aaAbb => aaaaabb

grammar MyContextFreeGrammar
    {
    token TOP { <S> }
    token S   { 'a' <S> 'b' | <A> | <B> }
    token A   { '' | 'a' | 'a' 'a' | 'a' 'a' 'a' }
    token B   { 'b' <B> | 'b' }
    } # end grammar

my $abString = 'aaaaabb';

my $myParseResult = MyContextFreeGrammar.parse( $abString ) ;

if $myParseResult.Bool
  {
  printf( "\nParsing of \'%s\' was *SUCCESSFUL*.\n", $abString ) ;
  }
else
  {
  printf( "\nParsing of \'%s\' was *NOT* successful.\n", $abString ) ;
  }

Program output is …

Parsing of 'aaaaabb' was *NOT* successful.

CodePudding user response:

Raku grammars are parsers, not generative grammars

A natural idiomatic solution to the problem example:

grammar S { token TOP { (a*) (b*) <?{ $0.chars <= $1.chars   3 }> } }

This directly encodes your example's description in a parser -- the token does a match/capture of one or more as, then a match/capture of one or more bs, and finally a predicate check corresponding to your example's formula.


Test code and results:

say "$_: {so S.parse: $_}" for ^9 .map: { 'a' x $_ ~ 'bb' }

bb: False
abb: True
aabb: True
aaabb: True
aaaabb: True
aaaaabb: True
aaaaaabb: False
aaaaaaabb: False
aaaaaaaabb: False

If you combine two or more CFGs the result may be ambiguous even if the constituent grammars are unambiguous. There is no algorithm that can detect if the result is ambiguous so there can't even be a warning or error if a composition yields this result.

One element of the original vision and mission for Raku was supporting grammar (and grammar fragment) composition. So generative grammars were out for its built in grammar construct.


To be clear, grammar composition isn't so much about a nice-to-have (though it is very, very nice to have), but something that's increasingly essential. Folk want to be able to do things like embed SQL in GPL code. Now what? If the SQL is in a string you're subject to injection attacks. Larry Wall decided there had to be a better way, a principled way. The result was Raku grammars.

cf my comments in the reddit thread Ergonomic inline SQL as a Python library. Two key excerpts:

(One key thing I forgot to mention and that might not be obvious is that Raku slangs are fully integrated at compile time. So there can be compile-time syntax, type, etc. errors, and full defense against injection attacks, which is a lot harder if the DSL (eg SQL) is in strings.)

Wow, I had no idea that such a language "braid" idea existed. Thanks for sharing that little bit of Raku. The "sql insert into ...; with" line is especially cool, since you literally are writing multiple languages in each other with very little syntax noise.

  • Related