Home > Software engineering >  Choosing between two parsers with a common prefix
Choosing between two parsers with a common prefix

Time:11-08

I am currently trying to write a simple parser in Parsec but keep running into problems regarding whitespace: As a minimal example, I have a parser that will parse two letters, either two lowercase or one uppercase and one lowercase. I would do this as

testP :: Parser String
testP = do
    lookAhead lower
    a1 <- lower
    a2 <- lower
    return [a1,a2]
    <|> do
    a1 <- upper
    a2 <- lower
    return [a1,a2]

This works as expected with strings like "as" or "Bs". Now I want to handle possible whitespace at the beginning of my input string. If I do

testP :: Parser String
testP = do
    spaces
    lookAhead lower
    a1 <- lower
    a2 <- lower
    return [a1,a2]
    <|> do
    a1 <- upper
    a2 <- lower
    return [a1,a2]

I would expect the program to be able to parse both " as" and " Bs" now, but for the second string I get an error "expecting space or lowercase letter" instead. Okay, I thought the spaces would be parsed regardless of which option is taken, but apparently not so let's put another spaces at the start of the second option, like this:

testP :: Parser String
testP = do
    spaces
    lookAhead lower
    a1 <- lower
    a2 <- lower
    return [a1,a2]
    <|> do
    spaces
    a1 <- upper
    a2 <- lower
    return [a1,a2]

This still gives me the same error when I try to parse " Bs", though. How am I misunderstanding whitespace handling here and how could I do this correctly?

CodePudding user response:

<|> will not try the second alternative if anything at all is consumed by the first parser. This is done to prevent space leaks. It was one of the foundational designs of parsec.

When spaces consumes some input, all is decided, that parser must now succeed - else, the alternative will not be tried and the whole machinery just fails. This is why you're observing this behavior. spaces consumes some input, lookAhead lower fails, the whole parser fails.

You could attain arbitrary lookahead with try and ensure the second alternative is tried even if the first consumes input, but you shouldn't, not in this case. Here, spaces is a non-fatal parser that is preliminary to both operations - so just use the parser before either of your alternatives.

testP :: Parser String
testP = spaces *> (do
    lookAhead lower
    a1 <- lower
    a2 <- lower
    return [a1,a2]
    <|> do
    a1 <- upper
    a2 <- lower
    return [a1,a2])
  • Related