Parse a predicate in Haskell-CodePudding

I would like to parse a predicate such as: "3 > 2" or "MyVar = 0".

Ideally, I would use a small GADT to represent the predicate:

data Expr a where
    I   :: Int  -> Expr Int
    B   :: Bool -> Expr Bool
    Var :: String -> Expr Int
    Add :: Expr Int -> Expr Int -> Expr Int
    Eq  :: Eq a => Expr a -> Expr a -> Expr Bool
    Mt  :: Eq a => Expr a -> Expr a -> Expr Bool

The expression 3 > 2 would parse as Mt (I 3) (I 2).

I tried to approach the problem with Parsec. However, the module Text.Parsec.Expr deals only with expressions, with type a -> a -> a. Any suggestions?

CodePudding user response：

Parsing directly into a GADT is actually kind of tricky. In my experience, it's usually better to parse into an untyped ADT first (where the a -> a -> a types are a natural fit), and then separately "type check" the ADT by transforming it into the desired GADT. The main disadvantage is that you have to define two parallel types for the untyped and typed abstract syntax trees. (You can technically get around this with some type level tricks, but it's not worth it for a small language.) However, the resulting design is easier to work with and generally more flexible.

In other words, I'd suggest using Parsec to parse into the untyped ADT:

data UExpr where
  UI :: Int -> UExpr
  UB :: Bool -> UExpr
  UVar :: String -> UExpr
  UAdd :: UExpr -> UExpr -> UExpr
  UEq :: UExpr -> UExpr -> UExpr
  UMt :: UExpr -> UExpr -> UExpr

and then write a type checker:

tc :: UExpr -> Expr a

Actually, you won't be able to write a tc like that. Instead you'll need to break it up into mutually recursive type checkers for different expression types:

tc_bool :: UExpr -> Expr Bool
tc_int :: UExpr -> Expr Int

and you'll probably want to run them in a Reader monad that provides a list of valid variables. (Type checking normally involves checking the types of variabels. In your case, you only have integer variables, but it still makes sense to ensure the variables are defined at the type checking stage.)

If you get stuck, a full solution follows...

SPOILERS

As I say, I'd first write a Parsec parser for an untyped UExpr ADT. Note that the Text.Parsec.Expr machinery works fine for UExpr -> UExpr -> UExpr operators:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# OPTIONS -Wall -Wno-missing-signatures #-}

import Text.Parsec
import Text.Parsec.Expr
import Text.Parsec.String
import Text.Parsec.Language
import Control.Monad.Reader
import Control.Exception
import Data.Maybe (fromJust)
import qualified Text.Parsec.Token as P

lexer = P.makeTokenParser haskellDef { P.reservedNames = ["true","false"] }

identifier = P.identifier lexer
integer = P.integer lexer
parens = P.parens lexer
reserved = P.reserved lexer
reservedOp = P.reservedOp lexer
symbol = P.symbol lexer

data UExpr where
  UI :: Int -> UExpr
  UB :: Bool -> UExpr
  UVar :: String -> UExpr
  UAdd :: UExpr -> UExpr -> UExpr
  UEq :: UExpr -> UExpr -> UExpr
  UMt :: UExpr -> UExpr -> UExpr
  deriving (Show)

expr :: Parser UExpr
expr = buildExpressionParser
  [ [Infix (UAdd <$ reservedOp " ") AssocLeft]
  , [Infix (UEq <$ reservedOp "=") AssocNone, Infix (UMt <$ reservedOp ">") AssocNone]
  ] term

term :: Parser UExpr
term = parens expr
  <|> UI . fromIntegral <$> integer
  <|> UB True <$ reserved "true"
  <|> UB False <$ reserved "false"
  <|> UVar <$> identifier

test_parser :: IO ()
test_parser = do
  parseTest expr "3 > 2"
  parseTest expr "MyVar = 0"

Then, I'd write a type checker, probably something like the following. Note that for type checking, we only need to verify that the variable names exist; we don't need their values. However, I've used a single Ctx type for both type checking and evaluation.

-- variable context (i.e., variable name/value pairs)
type Ctx = [(String, Int)]

data Expr a where
  I   :: Int  -> Expr Int
  B   :: Bool -> Expr Bool
  Var :: String -> Expr Int
  Add :: Expr Int -> Expr Int -> Expr Int
  Eq  :: (Show (Expr a), Eq a) => Expr a -> Expr a -> Expr Bool
  Mt  :: (Show (Expr a), Ord a) => Expr a -> Expr a -> Expr Bool
deriving instance Show (Expr Bool)
deriving instance Show (Expr Int)

tc_bool :: UExpr -> Reader Ctx (Expr Bool)
tc_bool (UB b) = pure $ B b
tc_bool (UEq x y) = Eq <$> tc_int x <*> tc_int y
tc_bool (UMt x y) = Mt <$> tc_int x <*> tc_int y
tc_bool _ = error "type error: expecting a boolean expression"

tc_int :: UExpr -> Reader Ctx (Expr Int)
tc_int (UI n) = pure $ I n
tc_int (UVar sym)
  = do mval <- asks (lookup sym)
       case mval of Just _ -> pure $ Var sym
                    _      -> error "type error: undefined variables"
tc_int (UAdd x y) = Add <$> tc_int x <*> tc_int y
tc_int _ = error "type error: expecting an integer expression"

test_tc :: IO ()
test_tc = do
  print $ run_tc_bool (UMt (UI 3) (UI 2))
  print $ run_tc_bool (UEq (UVar "MyVar") (UI 0))
  -- now some type errors
  handle showError $ print $ run_tc_bool (UMt (UB False) (UI 2))
  handle showError $ print $ run_tc_bool (UAdd (UEq (UI 1) (UI 1)) (UI 1))

  where showError :: ErrorCall -> IO ()
        showError e = print e

        run_tc_bool e = runReader (tc_bool e) [("MyVar", 42)]

You may be surprised to learn that the most natural way to write a type checker doesn't actually "use" the GADT. It could have been equally easily written using two separate types for boolean and integer expressions. You would have found the same thing if you'd actually tried to parse directly into the GADT. The parser code would need to be pretty cleanly divided between a parser for boolean expressions of type Parser (Expr Bool) and a parser for integer expressions of type Parser (Expr Int), and there'd be no straightforward way to write a single Parser (Expr a).

Really, the advantage of the GADT representation only comes at the evaluation stage where you can write a simple, type-safe evaluator that triggers no "non-exhaustive pattern" warnings, like so:

eval :: Expr a -> Reader Ctx a
eval (I n) = pure n
eval (B b) = pure b
eval (Var sym) = fromJust <$> asks (lookup sym)
eval (Add x y) = ( ) <$> eval x <*> eval y
eval (Eq x y) = (==) <$> eval x <*> eval y
eval (Mt x y) = (>) <$> eval x <*> eval y

test_eval :: IO ()
test_eval = do
  print $ run_eval (Mt (I 3) (I 2))
  print $ run_eval (Eq (Var "MyVar") (I 0))

  where run_eval e = runReader (eval e) [("MyVar", 42)]