Home > Software design >  haskell parser combinator infinite loop
haskell parser combinator infinite loop

Time:06-28

I'm trying to write a simple parser via Haskell, but stuck at an infinite loop. the code is:

import Control.Applicative (Alternative, empty, many, (<|>))

data Parser a = Parser {runParser :: String -> [(a, String)]}

instance Functor Parser where
  fmap f (Parser p) = Parser $ \s -> [(f x', s') | (x', s') <- p s]

instance Applicative Parser where
  pure x = Parser $ \s -> [(x, s)]
  (Parser pf) <*> (Parser p) = Parser $ \s -> [(f' x, ss') | (f', ss) <- pf s, (x, ss') <- p ss]

instance Alternative Parser where
  empty = Parser $ \s -> []
  (Parser p1) <|> (Parser p2) = Parser $ \s ->
    case p1 s of
      [] -> p2 s
      xs -> xs

singleSpaceParser :: Parser Char
singleSpaceParser = Parser $ \s ->
  ( case s of
      x : xs -> if x == ' ' then [(' ', xs)] else []
      [] -> []
  )

multiSpaceParser :: Parser [Char]
multiSpaceParser = many singleSpaceParser

I just load this file in ghci, and run:

runParser multiSpaceParser "  123"

I expect it to get [(" ", "123")], but actually it got an infinite loop

I used trace to debug, and it seems that many is wrong

How can I fix this bug?

CodePudding user response:

Let's assume

many p = (:) <$> p <*> many p <|> pure []

and consider the call

many singleSpaceParser "  123"

(The string does not actually matter here, the many singleSpaceParser call will always loop.)

One reduction step yields

   ((:) <$> singleSpaceParser <*> many singleSpaceParser <|> pure []) "  123"

Now observe that, in order to reduce the call to (<|>), we have to evaluate both arguments of (<|>) to be of the shape Parser ....

Let's consider doing that for (:) <$> singleSpaceParser <*> many singleSpaceParser. As both (<$>) and (<*>) are infixl 4, this is an application of <*> at the outermost level.

But now observe that in order to reduce (<*>), we again have to evaluate both arguments of (<*>) to be of the shape Parser ..., so in particular the recursive call many singleSpaceParser.

This is where we get the infinite loop.

By switching data to newtype (or alternatively, at least avoiding aggressively pattern-matching on the Parser constructor in all the second arguments), these problems can be avoided.

  • Related