Home > Software design >  Convert Sum Type to Phantom Type via Type Classes
Convert Sum Type to Phantom Type via Type Classes

Time:01-03

I was experimenting with phantom types in Haskell. My goal is to convert the LangCode Type to it's corresponding Phantom Type representation via Type Classes for example DE to Lang DE.

module Main (main) where

import Data.Proxy (Proxy(..))

data DE
data EN

data LangCode
  = DE
  | EN
  deriving (Eq, Show)

type Lang a = Proxy a

de :: Lang DE
de = Proxy

en :: Lang EN
en = Proxy

class ToLangCode a where
  toLangCode :: Lang a -> LangCode

instance ToLangCode DE where
  toLangCode _ = DE
instance ToLangCode EN where
  toLangCode _ = EN

class FromLangCode a where
  fromLangCode :: LangCode -> Lang a

instance FromLangCode DE where
  fromLangCode DE = Proxy
instance FromLangCode EN where
  fromLangCode EN = Proxy

main :: IO ()
main = do
  print $ de -- Output => Proxy
  print $ en -- Output => Proxy

  print $ toLangCode de -- Output => DE
  print $ toLangCode en -- Output => EN

  -- works
  print $ (fromLangCode DE :: Lang DE) -- Output => Proxy
  print $ (fromLangCode EN :: Lang EN) -- Output => Proxy

  -- throws an error
  print $ fromLangCode DE -- Output => Proxy
  print $ fromLangCode EN -- Output => Proxy

With a type annotation it works fine. But without it I get this error.

[1 of 1] Compiling Main             ( main.hs, main.o )

main.hs:50:11: error:
    * Ambiguous type variable `a0' arising from a use of `fromLangCode'
      prevents the constraint `(FromLangCode a0)' from being solved.
      Probable fix: use a type annotation to specify what `a0' should be.
      These potential instances exist:
        instance FromLangCode DE -- Defined at main.hs:32:10
        instance FromLangCode EN -- Defined at main.hs:34:10
    * In the second argument of `($)', namely `fromLangCode DE'
      In a stmt of a 'do' block: print $ fromLangCode DE
      In the expression:
        do print $ de
           print $ en
           print $ toLangCode de
           print $ toLangCode en
           ....
   |
50 |   print $ fromLangCode DE -- Output => Proxy
   |           ^^^^^^^^^^^^^^^

main.hs:51:11: error:
    * Ambiguous type variable `a1' arising from a use of `fromLangCode'
      prevents the constraint `(FromLangCode a1)' from being solved.
      Probable fix: use a type annotation to specify what `a1' should be.
      These potential instances exist:
        instance FromLangCode DE -- Defined at main.hs:32:10
        instance FromLangCode EN -- Defined at main.hs:34:10
    * In the second argument of `($)', namely `fromLangCode EN'
      In a stmt of a 'do' block: print $ fromLangCode EN
      In the expression:
        do print $ de
           print $ en
           print $ toLangCode de
           print $ toLangCode en
           ....
   |
51 |   print $ fromLangCode EN -- Output => Proxy
   |           ^^^^^^^^^^^^^^^
exit status 1

My question is. Is it possible to implement it in a way so that the type annotations are no longer needed?

Updated version

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleInstances #-}

module Main (main) where

data LangCode = DE | EN deriving (Eq, Show)

data SLangCode a where
    SDE :: SLangCode DE
    SEN :: SLangCode EN

data Lang (a :: LangCode) where
  LangDE :: Lang 'DE
  LangEN :: Lang 'EN

deriving instance Show (Lang 'DE)
deriving instance Show (Lang 'EN)

slcDE :: SLangCode a -> Lang 'DE -> Lang a
slcDE SDE t = t
slcDE SEN _ = LangEN

slcEN :: SLangCode a -> Lang 'EN -> Lang a
slcEN SEN t = t
slcEN SDE _ = LangDE

class ToLangCode a where
  toLangCode :: Lang a -> LangCode

instance ToLangCode 'DE where
  toLangCode _ = DE
instance ToLangCode 'EN where
  toLangCode _ = EN

class FromLangCode a where
  fromLangCode :: SLangCode a -> LangCode -> Lang a

instance FromLangCode 'DE where
  fromLangCode SDE DE = LangDE
instance FromLangCode 'EN where
  fromLangCode SEN EN = LangEN

main :: IO ()
main = do
  print $ toLangCode LangDE -- Output => DE
  print $ toLangCode LangEN -- Output => EN

  print $ fromLangCode SDE DE -- Output => LangDE
  print $ fromLangCode SEN EN -- Output => LangEN

CodePudding user response:

Here's the problem. It's a fundamental property of types that if the term-level expressions e1 and e2 have the same type t, then substituting e1 with e2 in a program doesn't change any of the program's types. That's what makes them types.

You want the expression (with no explicit type signature):

fromLangCode EN

to have type Lang EN, and that's easy enough. But if the term-level expressions EN and DE are constructors from the same sum type, LangCode, then substituting one for the other won't change any types, so the expression:

fromLangCode DE

will still have type Lang EN, which is clearly not what you want.

So, if you want two different inferred types:

fromLangCode EN :: Lang EN
fromLangCode DE :: Lang DE

then any solution is going to require that the term-level expressions EN and DE have different types, which means that you can't have:

data LangCode = EN | DE

So, that's the short answer -- you can't freely convert between a sum type LangCode and proxies of type Lang lang. Or rather, you can easily convert from type Lang lang to term LangCode using a type class (like ToLangCode), but you can't really convert back.

There are many solutions to this "problem", but it kind of depends on what you're trying to do, which is why folks in the comments are asking you questions about "use cases" and "expected behavior".

A Simple Solution that Accomplishes Nothing

One simple solution is to write:

data EN = EN
data DE = DE

Here, the term-level expressions EN and DE have different types (EN and DE respectively). This allows you to easily implement the desired interface using your main function verbatim:

import Data.Proxy

data EN = EN deriving (Show)
data DE = DE deriving (Show)

type Lang a = Proxy a

de :: Lang DE
de = Proxy

en :: Lang EN
en = Proxy

class ToLangCode a where
  toLangCode :: Lang a -> a

instance ToLangCode DE where
  toLangCode _ = DE
instance ToLangCode EN where
  toLangCode _ = EN

class FromLangCode a where
  fromLangCode :: a -> Lang a

instance FromLangCode DE where
  fromLangCode DE = Proxy
instance FromLangCode EN where
  fromLangCode EN = Proxy

main :: IO ()
main = do
  print $ de -- Output => Proxy
  print $ en -- Output => Proxy

  print $ toLangCode de -- Output => DE
  print $ toLangCode en -- Output => EN

  -- works
  print $ (fromLangCode DE :: Lang DE) -- Output => Proxy
  print $ (fromLangCode EN :: Lang EN) -- Output => Proxy

  -- works fine now
  print $ fromLangCode DE -- Output => Proxy
  print $ fromLangCode EN -- Output => Proxy

If you're suspicious of this "solution", you're right to be. It doesn't really accomplish anything, since the term-level expressions EN and DE are already distinct at the type level, and this program is really just converting between one type level representation (types EN and DE) and another (types Lang EN and Lang DE).

Using a GADT

One way to kind of do what you want is to use a GADT. If we define LangCode as a "generalized" sum type:

{-# LANGUAGE GADTs #-}

data EN
data DE

data LangCode lang where
  EN :: LangCode EN
  DE :: LangCode DE

everything works more or less like my previous example, with a few minor changes to type signatures, and main left unchanged, as below:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE StandaloneDeriving #-}

import Data.Proxy

data EN
data DE

data LangCode lang where
  EN :: LangCode EN
  DE :: LangCode DE
deriving instance Show (LangCode a)

type Lang a = Proxy a

de :: Lang DE
de = Proxy

en :: Lang EN
en = Proxy

class ToLangCode a where
  toLangCode :: Lang a -> LangCode a

instance ToLangCode DE where
  toLangCode _ = DE
instance ToLangCode EN where
  toLangCode _ = EN

class FromLangCode a where
  fromLangCode :: LangCode a -> Lang a

instance FromLangCode DE where
  fromLangCode DE = Proxy
instance FromLangCode EN where
  fromLangCode EN = Proxy

main :: IO ()
main = do
  print $ de -- Output => Proxy
  print $ en -- Output => Proxy

  print $ toLangCode de -- Output => DE
  print $ toLangCode en -- Output => EN

  -- works
  print $ (fromLangCode DE :: Lang DE) -- Output => Proxy
  print $ (fromLangCode EN :: Lang EN) -- Output => Proxy

  -- works fine now
  print $ fromLangCode DE -- Output => Proxy
  print $ fromLangCode EN -- Output => Proxy

So, we can freely convert between this generalized sum type and a phantom type representation.

This really isn't a big improvement over the previous example. The two constructors are now formally part of a generalized sum type, but the term-level expressions EN and DE are already distinct at the type level, and we're just converting between one type-level representation (types LangCode EN and LangCode DE) and another (types Lang EN and Lang DE).

However, the same criticism could be levelled at your "updated example". By introducing singletons (a generalized sum type), you too are already making the expressions fromLangCode SDE DE and fromLangCode SEN EN distinct at the type-level in that first, singleton argument. The second, term-level argument plays no useful role here and could be eliminated, so you're just converting from one type-level representation (SLangCode DE versus SLangCode EN) to another (Lang DE versus Lang EN).

Existental Types

Actual useful conversion between term and type-level representations usually involves an existential type somewhere in the mix. It helps to consider a slightly more realistic example. Suppose you might want to be able to use the type system to help avoid inappropriately mixing languages. For example:

import Data.List.Extra

data EN
data DE
newtype Text lang = Text String deriving (Show)

item :: Text EN
item = Text "The big elephant"

artikel :: Text DE
artikel = Text "Die großen Elefanten"

fixß :: Text DE -> Text DE
fixß (Text x) = Text $ replace "ß" "ss" x

pluralize :: Text EN -> Text EN
pluralize (Text noun) | "s" `isSuffixOf` noun = Text $ noun    "ses"
                      | otherwise = Text $ noun    "s"

message_en :: Text EN
message_en = pluralize item

message_de :: Text DE
message_de = fixß artikel

-- type system prevents applying german manipulations to english text
type_error_1 = fixß item

But, you might also like to make a run-time decision about which language is being used in a particular expression:

data LangCode = EN | DE

main :: IO ()
main = do
  let language = EN  -- assume this comes from args or user input
  -- type error: `mytext` can't be both `Text EN` and `Text DE`
  let mytext = case language of
        EN -> message_en
        DE -> message_de
  print mytext

This doesn't work because the type of mytext can't depend on a runtime computation. That is, there's no simple way to convert the runtime term-level value language :: LangCode (a sum type) to a type-level value Lang language, the desired type of mytext.

The usual solution in is to use an existential type:

{-# LANGUAGE ExistentialQuantification #-}
data SomeText = forall lang. SomeText (Text lang)

Here, the type SomeText represents text in some unspecified language (i.e., a value of type Text lang for some unspecified type lang). Now, mytext can be assigned text in a runtime-determined language by wrapping it with the SomeText constructor.

let mytext = case language of
      EN -> SomeText message_en
      DE -> SomeText message_de

We are limited in what we can do with a SomeText value like mytext -- we can't do anything that depends on knowing the language, like applying pluralize or fixß or whatever. However, one thing we can do is extract the string, since that works for any language:

getText :: SomeText -> String
getText (SomeText (Text str)) = str

which allows us to write a useful main:

main :: IO ()
main = do
  let language = EN
  let mytext = case language of
        EN -> SomeText message_en
        DE -> SomeText message_de
  print $ getText mytext

Here's the full working example:

{-# LANGUAGE ExistentialQuantification #-}

import Data.List.Extra

data EN
data DE
data LangCode = EN | DE
newtype Text lang = Text String deriving (Show)

data SomeText = forall lang. SomeText (Text lang)

item :: Text EN
item = Text "The big elephant"

artikel :: Text DE
artikel = Text "Die großen Elefanten"

fixß :: Text DE -> Text DE
fixß (Text x) = Text $ replace "ß" "ss" x

pluralize :: Text EN -> Text EN
pluralize (Text noun) | "s" `isSuffixOf` noun = Text $ noun    "ses"
                      | otherwise = Text $ noun    "s"

message_en :: Text EN
message_en = pluralize item

message_de :: Text DE
message_de = fixß artikel

getText :: SomeText -> String
getText (SomeText (Text str)) = str

main :: IO ()
main = do
  let language = DE
  let mytext = case language of
        EN -> SomeText message_en
        DE -> SomeText message_de
  print $ getText mytext

What we've done here is successfully converted a term-level value from a sum type (language) to a type-level value Text language by wrapping it in a SomeText constructor.

Applied to Your Example

We can use the same technique to convert freely between a sum type and type-level proxies, by shielding the type-level proxies in an existential. It might look something like this. Note how I've used a custom Show instance to differentiate proxies of different types, to prove we're doing something useful.

{-# LANGUAGE ExistentialQuantification #-}

import Data.Proxy

data DE
data EN

-- sum type
data LangCode
  = DE
  | EN
  deriving (Eq, Show)

-- existential for type-level proxies
type Lang = Proxy
data SomeLang = forall a. ToLangCode a => SomeLang (Lang a)
instance Show SomeLang where
  show (SomeLang lang) = "SomeLang Proxy<"    show (toLangCode' lang)    ">"

-- convert from LangCode to SomeLang
fromLangCode :: LangCode -> SomeLang
fromLangCode EN = SomeLang (Proxy :: Lang EN)
fromLangCode DE = SomeLang (Proxy :: Lang DE)

-- convert from SomeLang to LangCode
class ToLangCode lang where
  toLangCode' :: Lang lang -> LangCode
instance ToLangCode EN where
  toLangCode' Proxy = EN
instance ToLangCode DE where
  toLangCode' Proxy = DE
toLangCode :: SomeLang -> LangCode
toLangCode (SomeLang lang) = toLangCode' lang

de :: SomeLang
de = SomeLang (Proxy :: Lang DE)

en :: SomeLang
en = SomeLang (Proxy :: Lang EN)

main :: IO ()
main = do
  print $ de -- Output => SomeLang Proxy<DE>
  print $ en -- Output => SomeLang Proxy<EN>

  print $ toLangCode de -- Output => DE
  print $ toLangCode en -- Output => EN

  print $ fromLangCode DE -- Output => SomeLang Proxy<DE>
  print $ fromLangCode EN -- Output => SomeLang Proxy<EN>
  • Related