Home > Mobile >  Implement heredocs with trim indent using PEG.js
Implement heredocs with trim indent using PEG.js

Time:10-16

I working on a language similar to ruby called gaiman and I'm using PEG.js to generate the parser.

Do you know if there is a way to implement heredocs with proper indentation?

xxx =  <<<END
       hello
       world
       END

the output should be:

"hello
world"

I need this because this code doesn't look very nice:

def foo(arg) {
  if arg == "here" then
     return <<<END
xxx
  xxx
END
  end
end

this is a function where the user wants to return:

"xxx
  xxx"

I would prefer the code to look like this:

def foo(arg) {
  if arg == "here" then
     return <<<END
            xxx
              xxx
            END
  end
end

If I trim all the lines user will not be able to use a string with leading spaces when he wants. Does anyone know if PEG.js allows this?

I don't have any code yet for heredocs, just want to be sure if something that I want is possible.

EDIT:

So I've tried to implement heredocs and the problem is that PEG doesn't allow back-references.

heredoc = "<<<" marker:[\w]  "\n" text:[\s\S]  marker {
    return text.join('');
}

It says that the marker is not defined. As for trimming I think I can use location() function

CodePudding user response:

I don't think that's a reasonable expectation for a parser generator; few if any would be equal to the challenge.

For a start, recognising the here-string syntax is inherently context-sensitive, since the end-delimiter must be a precise copy of the delimiter provided after the <<< token. So you would need a custom lexical analyser, and that means that you need a parser generator which allows you to use a custom lexical analyser. (So a parser generator which assumes you want a scannerless parser might not be the optimal choice.)

Recognising the end of the here-string token shouldn't be too difficult, although you can't do it with a single regular expression. My approach would be to use a custom scanning function which breaks the here-string into a series of lines, concatenating them as it goes until it reaches a line containing only the end-delimiter.

Once you've recognised the text of the literal, all you need to normalise the spaces in the way you want is the column number at which the <<< starts. With that, you can trim each line in the string literal. So you only need a lexical scanner which accurately reports token position. Trimming wouldn't normally be done inside the generated lexical scanner; rather, it would be the associated semantic action. (Equally, it could be a semantic action in the grammar. But it's always going to be code that you write.)

When you trim the literal, you'll need to deal with the cases in which it is impossible, because the user has not respected the indentation requirement. And you'll need to do something with tab characters; getting those right probably means that you'll want a lexical scanner which computes visible column positions rather than character offsets.

I don't know if peg.js corresponds with those requirements, since I don't use it. (I did look at the documentation, and failed to see any indication as to how you might incorporate a custom scanner function. But that doesn't mean there isn't a way to do it.) I hope that the discussion above at least lets you check the detailed documentation for the parser generator you want to use, and otherwise find a different parser generator which will work for you in this use case.

CodePudding user response:

Here is the implementation of heredocs in Peggy successor to PEG.js that is not maintained anymore. This code was based on the GitHub issue.

heredoc = "<<<" begin:marker "\n" text:($any_char  "\n")  _ end:marker (
    &{ return begin === end; }
  / '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
) {
    const loc = location();
    const min = loc.start.column - 1;
    const re = new RegExp(`\\s{${min}}`);
    return text.map(line => {
        return line[0].replace(re, '');
    }).join('\n');
}
any_char = (!"\n" .)
marker_char = (!" " !"\n" .)
marker "Marker" = $marker_char 

_ "whitespace"
  = [ \t\n\r]* { return []; }

EDIT: above didn't work with another piece of code after heredoc, here is better grammar:

{ let heredoc_begin = null; }

heredoc = "<<<" beginMarker "\n" text:content endMarker {
    const loc = location();
    const min = loc.start.column - 1;
    const re = new RegExp(`^\\s{${min}}`, 'mg');
    return {
        type: 'Literal',
        value: text.replace(re, '')
    };
}
__ = (!"\n" !" " .)
marker 'Marker' = $__ 
beginMarker = m:marker { heredoc_begin = m; }
endMarker = "\n" " "* end:marker &{ return heredoc_begin === end; }
content = $(!endMarker .)*
  • Related