I am trying to make a parser for the following recursive datatype:
data Expr = Val Int
| Var Char
| App Op Expr Expr
deriving Show
data Op = Add | Sub | Mul | Div
deriving Show
It should, for example, parse "(1 (a / -2))"
as App Add (Val 1) (App Div (Var 'a') (Val (-2)))
. I've managed to write parsers for the Val
and Var
constructors as well as for Op
's constructors like so:
import Text.Regex.Applicative
import Data.Char
rNonnegativeIntegral :: (Read a, Integral a) => RE Char a
rNonnegativeIntegral = read <$> some (psym isDigit)
rNegativeIntegral :: (Read a, Integral a) => RE Char a
rNegativeIntegral = negate <$> (sym '-' *> rNonnegativeIntegral)
rIntegral :: (Read a, Integral a) => RE Char a
rIntegral = rNonnegativeIntegral <|> rNegativeIntegral
rVal :: RE Char Expr
rVal = Val <$> rIntegral
rVar :: RE Char Expr
rVar = Var <$> psym isAlpha
rOp = aux <$> (foldr1 (<|>) $ map sym " -*/")
where
aux ' ' = Add
aux '-' = Sub
aux '*' = Mul
aux '/' = Div
When this is loaded into ghci it can produce the following output:
ghci> findLongestPrefix rVal "-271"
Just (Val (-271), "")
ghci> findLongestPrefix rVar "a"
Just (Var 'a', "")
ghci> findLongestPrefix rOp "-"
Just (Sub, "")
The trouble comes when I introduce this recursive definition for the App
constructor:
whiteSpace :: RE Char String
whiteSpace = many $ psym isSpace
strictWhiteSpace :: RE Char String
strictWhiteSpace = some $ psym isSpace
rApp :: RE Char Expr
-- flip App :: Expr -> Op -> Expr
-- strictWhiteSpace after rOp to avoid conflict with rNegativeInteger
rApp = flip App <$> (sym '(' *> whiteSpace *> rExpr)
<*> (whiteSpace *> rOp <* strictWhiteSpace)
<*> (rExpr <* whiteSpace <* sym ')')
rExpr :: RE Char Expr
rExpr = rVal <|> rVar <|> rApp
This loads into ghci just fine, and all previous constructors still work. But findLongestPrefix rApp "(1 a)"
and many similar expressions cause ghci to hang and produce no output.
Through experimentation I've found that the issue happens in general when rExpr
is passed in as the first argument to <*
. For example, findLongestPrefix (rExpr <* whiteSpace) "a)"
also causes ghci to hang.
Also, when the definition for rExpr
is replaced by
rExpr = rVal <|> rVar
all of these hanging issues go away. Simple expressions like "(1 a)"
are able to be parsed, but support for recursive expressions is not available.
How can I implement a recursive parser here without hanging issues?
CodePudding user response:
The language of expressions that you describe isn't regular. So you'll have to use a different library.
Luckily, essentially the same parser structure should work fine with most other parser combinator libraries. It should be as simple as substituting your new library's name for a few basic parsers in place of their regex-applicative analogs.