I'm trying to write a simple parser via Haskell, but stuck at an infinite loop. the code is:
import Control.Applicative (Alternative, empty, many, (<|>))
data Parser a = Parser {runParser :: String -> [(a, String)]}
instance Functor Parser where
fmap f (Parser p) = Parser $ \s -> [(f x', s') | (x', s') <- p s]
instance Applicative Parser where
pure x = Parser $ \s -> [(x, s)]
(Parser pf) <*> (Parser p) = Parser $ \s -> [(f' x, ss') | (f', ss) <- pf s, (x, ss') <- p ss]
instance Alternative Parser where
empty = Parser $ \s -> []
(Parser p1) <|> (Parser p2) = Parser $ \s ->
case p1 s of
[] -> p2 s
xs -> xs
singleSpaceParser :: Parser Char
singleSpaceParser = Parser $ \s ->
( case s of
x : xs -> if x == ' ' then [(' ', xs)] else []
[] -> []
)
multiSpaceParser :: Parser [Char]
multiSpaceParser = many singleSpaceParser
I just load this file in ghci, and run:
runParser multiSpaceParser " 123"
I expect it to get [(" ", "123")]
, but actually it got an infinite loop
I used trace to debug, and it seems that many
is wrong
How can I fix this bug?
CodePudding user response:
Let's assume
many p = (:) <$> p <*> many p <|> pure []
and consider the call
many singleSpaceParser " 123"
(The string does not actually matter here, the many singleSpaceParser
call will always loop.)
One reduction step yields
((:) <$> singleSpaceParser <*> many singleSpaceParser <|> pure []) " 123"
Now observe that, in order to reduce the call to (<|>)
, we have to evaluate both arguments of (<|>)
to be of the shape Parser ...
.
Let's consider doing that for (:) <$> singleSpaceParser <*> many singleSpaceParser
. As both (<$>)
and (<*>)
are infixl 4
, this is an application of <*>
at the outermost level.
But now observe that in order to reduce (<*>)
, we again have to evaluate both arguments of (<*>)
to be of the shape Parser ...
, so in particular the recursive call many singleSpaceParser
.
This is where we get the infinite loop.
By switching data
to newtype
(or alternatively, at least avoiding aggressively pattern-matching on the Parser
constructor in all the second arguments), these problems can be avoided.