I am currently trying to write a simple parser in Parsec but keep running into problems regarding whitespace: As a minimal example, I have a parser that will parse two letters, either two lowercase or one uppercase and one lowercase. I would do this as
testP :: Parser String
testP = do
lookAhead lower
a1 <- lower
a2 <- lower
return [a1,a2]
<|> do
a1 <- upper
a2 <- lower
return [a1,a2]
This works as expected with strings like "as" or "Bs". Now I want to handle possible whitespace at the beginning of my input string. If I do
testP :: Parser String
testP = do
spaces
lookAhead lower
a1 <- lower
a2 <- lower
return [a1,a2]
<|> do
a1 <- upper
a2 <- lower
return [a1,a2]
I would expect the program to be able to parse both " as" and " Bs" now, but for the second string I get an error "expecting space or lowercase letter" instead.
Okay, I thought the spaces would be parsed regardless of which option is taken, but apparently not so let's put another spaces
at the start of the second option, like this:
testP :: Parser String
testP = do
spaces
lookAhead lower
a1 <- lower
a2 <- lower
return [a1,a2]
<|> do
spaces
a1 <- upper
a2 <- lower
return [a1,a2]
This still gives me the same error when I try to parse " Bs", though. How am I misunderstanding whitespace handling here and how could I do this correctly?
CodePudding user response:
<|>
will not try the second alternative if anything at all is consumed by the first parser. This is done to prevent space leaks. It was one of the foundational designs of parsec.
When spaces
consumes some input, all is decided, that parser must now succeed - else, the alternative will not be tried and the whole machinery just fails. This is why you're observing this behavior. spaces
consumes some input, lookAhead lower
fails, the whole parser fails.
You could attain arbitrary lookahead with try
and ensure the second alternative is tried even if the first consumes input, but you shouldn't, not in this case. Here, spaces
is a non-fatal parser that is preliminary to both operations - so just use the parser before either of your alternatives.
testP :: Parser String
testP = spaces *> (do
lookAhead lower
a1 <- lower
a2 <- lower
return [a1,a2]
<|> do
a1 <- upper
a2 <- lower
return [a1,a2])