Home > database >  Eagerness of separator parser in `sepBy` in Parsec
Eagerness of separator parser in `sepBy` in Parsec

Time:12-17

This is a question pertaining to a more general question of parser precedence in Haskell's Parsec, so feel free to respond with more generality about my problem.

Let's say I have a p :: Parsec String () Int which consumes every character in a String and sums its ASCII values at the end. I want to turn this p into another parser of the same type which computes the result of this ASCII summing on substrings which are separated by the newline character, and takes their product. E.g. AB\nZ\nCEF parses to (65 66)*(90)*(67 69 70)=2428740.

I could use lines on the input, map the elements to their parse values, and take their product, but I want to keep this Parsec-native; at the very least I would want a general function Parsec String u a -> Parsec [String] u [a].

I could use sepBy p newline, but this will only work if my p was specifically designed to reject parsing \n, which isn't very compositionally minded.

CodePudding user response:

Parsec doesn't explicitly manage precedence of its parsers. Rather, it combines parsers using various control structure combinators, like sequences or loops, so that a sequence p >> q means "first run parser p, and once it's successfully completed, run parser q; if either fails, fail the whole operation". Control over the input stream comes from the bottom up. A parser decides when it's done, whether it succeeds or fails, and how much of the input stream it consumes, and the parent combinator's control over a parser is basically limited to an all-or-nothing decision: either accept what the parser did or reject it entirely.

This means that a particular parser, like your p, that gobbles the rest of the stream, can't be controlled by a parent combinator, like sepBy. All sepBy can do is run it to completion, and then check if the separator parser succeeds before running p again.

The correct way to write a Parsec parser to do what you want is to modify p so that it rejects newlines. This may not strike you as "compositionally minded", but it's how Parsec works, and when you write more realistic Parsec parsers, you'll discover that the compositional aspects for Parsec parsers rely on individual parsers playing nicely with the "shared resource" of the input stream. A Parsec parser must make the correct local decision about how much of the input stream to parse. You'll also find that higher level combinators will frequently need to be written with non-local knowledge about their composite parsers, to avoid falling into "try hell". I'm thinking here specifically about ordering of parsers in a chain of <|> alternatives.

This is just a fundamental limitation of the Parsec design, and it's seen as a necessary trade off between "compositionality" and "practicality/efficiency".

That being said, don't forget that you're programming in a functional programming language, parsers are just values, and functions can create them. If you want a reusable version of p, try:

type Parser = Parsec String ()

sumWhile :: (Char -> Bool) -> Parser Int
sumWhile p = sum . map ord <$> many (satisfy p)

p = sumWhile (/= '\n')

This example isn't so great because, honestly, this component isn't worth reusing, but this technique is helpful for more complex reusable components.

CodePudding user response:

Generally the sub-parsers that are combined to make up a whole Parsec parser need to cooperate with each other. If a parser wants to consume all available input, there's nothing another parser can do to "sandbox" it within the context of a single, larger parser. If you want something that consumes all input until the next newline, you have to teach it to not consume the newline.

Note I qualified this with "in the context of a single, larger parser". This is because, as you suggest, you can work around this by doing stuff above the level of Parsec: call lines first on the input, for example, and then parse each line separately. There's no shame in doing this: pre-processing inputs before giving them to a parser is a grand old tradition. See lexers, for example.

You mentioned that you want at least "a general function Parsec String u a -> Parsec [String] u [a]". I'm not sure this would be so useful to you, because you couldn't combine the resulting Parsec [String] u [a] with any of the other Parsec String u a sub-parsers you have written for the rest of the task.

  • Related