Return type polymorphism in haskell-CodePudding

I'm trying to understand polymorphism in haskell. Given the typical example below

module Main where

data Dog = Dog
data Cat = Cat

class Animal a where
    speak :: a -> String
    getA :: a

instance Animal Dog where
    speak _ = "Woof"
    getA = Dog

instance Animal Cat where
    speak _ = "Meow"
    getA = Cat

doA animal = do
    putStrLn $ speak animal

main :: IO ()
main = do
    doA Dog
    doA Cat
    doA (getA :: Dog)

I have the getA function which is part of the Animal typeclass and it works as expected. I can use getA as long as I provide the type annotation like read.

However when I try to define a standalone function like below, it doesn't compile. Why is this an error?

getA' :: Animal a => a
getA' = if True then Dog else Cat

Why does the independent function getA' not work while getA does?

CodePudding user response：

This is a very common mistake: overlooking the direction of the polymorphism.

In short: it's the caller of the function that gets to choose type parameters, not the implementer of the function.

Slightly longer: when you give your function a signature like Animal a => a, you're making a promise to any caller of your function, and that promise reads something like this: "Pick a type. Any type. Whatever type you want. Let's call it a. Now make sure there is an instance Animal a. Now I can return you a value of type a"

So you see, when you write such function, you don't get to return a specific type that you choose. You have to return whatever type the caller of your function will choose later when they call it.

To drive it home with a specific example, imagine that your getA' function is possible, and then consider this code:

data Giraffe = Giraffe

instance Animal Giraffe where
  speak _ = "Huh?"
  getA = Giraffe

myGiraffe :: Giraffe
myGiraffe = getA'  -- does this work? how?

With a type class method this works, because it's not the same function that the caller is calling. It's two different functions, one for Dog and another for Cat, that just happen to share the same name.

When the caller gets around to calling one of these functions, they need to somehow choose which one. This can be done in two ways: either (1) they know the exact type they want, and then the compiler can look up the corresponding function for that type, or (2) somebody else has somehow passed an Animal instance to them, and it's that instance that contains a reference to the function.

Now, if what you really wanted to do was to create a system where there can be a limited number of animals (i.e. just Cat and Dog), and the getA' function would return one of them, depending on reasons, then what you're looking for is not a type class, but just an ADT, like this:

data Animal = Cat | Dog

speak :: Animal -> String
speak Cat = "Meow"
speak Dog = "Woof"

getA' :: Animal
getA' = if True then Dog else Cat

Here, the function getA' will work just fine, because both Cat and Dog are values of the same type Animal. All types are always known, there is nothing generic.

Q: Ok, but this way, if I want to add Giraffe, I can't do it later, in another module, I have to modify the Animal type. Can't I have it both ways?

Short answer: no. This is a well-known problem, called "The Expression Problem", and the basic idea is that you can either have everything known upfront ("closed world"), or you get to add more things later ("open world"), but you can't have both at the same time. Duh!

But in Haskell, you still sorta can. But not really. This is a bit more advanced, so please ignore if it seems confusing.

What you can do is add another type, which will contain an animal value plus its Animal instance. Both wrapped up in a box. It looks like this:

data SomeAnimal where
  SomeAnimal :: Animal a => a -> SomeAnimal

Then you can construct values of this type by wrapping Cat or Dog:

aCat :: SomeAnimal
aCat = SomeAnimal Cat

aDog :: SomeAnimal
aDog = SomeAnimal Dog

Note that both aCat and aDog are of the same type SomeAnimal. This is the key point. They're values of different types wrapped inside the box that looks the same from the outside, and the box also contains their respective Animal instance.

And this means that, if you unbox the box, you get the value and its Animal instance, which in turn means that you get to use the Animal methods. For example:

someSpeak :: SomeAnimal -> String
someSpeak (SomeAnimal a) = speak a

And with this, you can implement your getA' function this way:

getA' :: SomeAnimal
getA' = if True then SomeAnimal Dog else SomeAnimal Cat

However, you still get "The Expression Problem", because I actually lied a little: it's not about "closed world" vs. "open world", it's about extending the set of operations vs. extending the set of possible values. One will always be easy, and the other hard (read the link for details).

And this applies to this case too:

if you make Cat and Dog values of the same type, you get to easily add more functions, but if you want to add more animals, you have to find all those functions you already made and modify them. Hard.
if you make them different types and go the SomeAnimal route to unify them, you get to easily add more animals - just make a type and implement the Animal class. But if you want to add more functions, you have to go through all those animals you already made and add implementations for the new function to each of their Animal instances.