Home > Software design >  Why does the order of these Haskell Parsec combinators matter?
Why does the order of these Haskell Parsec combinators matter?

Time:11-13

I want to make a simple parser to parse an addition expression. Here is my code:

import Text.Parsec.Char
import Text.Parsec.String
import Text.ParserCombinators.Parsec

data Expr = Number Float |
            Add Expr Expr |

number :: Parser Expr
number = do
    n <- try $ many1 digit
    return $ Number $ read n

add :: Parser Expr
add = do
    e1 <- number
    char ' '
    e2 <- number
    return $ Add e1 e2

expr :: Parser Expr
expr =  try number <|> try add
       

p :: String -> Either ParseError Expr
p = parse (do{e <- expr; eof; return e}) "error"

But here is the output

ghci> parse add "err" "1 2"
Right (Add (Number 1.0) (Number 2.0))
ghci> p "1"
Right (Number 1.0)
ghci> p "1 2"
Left "error" (line 1, column 2):
unexpected ' '
expecting digit or end of input

But if I change the order of the expr combinators to

expr :: Parser Expr
expr =  try add <|> try number

Then the output changes to

ghci> p "1 2"
Right (Add (Number 1.0) (Number 2.0))

Why does this happen? I thought the try keyword forces the Parsers I am combining to restart after each <|>.

I plan on making this much larger so I want to be sure I understand why this is happening now.

My actual program is larger already but this is still causing a problem independantly.

CodePudding user response:

The problem you're facing is that when the string "1 2" is parsed with number, it succeeds (admittedly, with some unparsed characters). The use of try only matters if it had failed.

Perhaps another way to show this is to consider the example try (string "a") <|> try (string "ab"). This will succeed in matching any string starting with the character a, but it will never match on strings that start with "ab".

If you had tried instead

exprAll :: Parser Expr
exprAll =  try (number <* eof) <|> try (add <* eof)

then you may get the behavior you're looking for. In this case, the "try"d parser does not succeed until the end-of-file character is reached, so when the is encountered, the parse attempt of number <* eof fails and then parsing starts over using add <* eof.

  • Related