Superclass constraints vs. instance declarations in Haskell-CodePudding

It kind of seems like in Haskell one can use instance declarations similarly to superclass constraints. I'll give an example using Semigroup and Monoid from Prelude. Here's a simplified definition of Semigroup:

class Semigroup a where
        (<>) :: a -> a -> a

Here's a simplified version of Prelude's definition of Monoid which uses a superclass constraint:

class Semigroup a => Monoid a where
        mempty  :: a
        mappend :: a -> a -> a
        mappend = (<>)

If I'm not mistaken it seems the superclass constraint could be replaced by an instance declaration as follows:

class Monoid a where
        mempty  :: a
        mappend :: a -> a -> a

instance Monoid a => Semigroup a where
        (<>) = mappend

Is there a reason for Haskell to have superclass constraints when it seems one can do the same with instance declarations (with constraints)? Which one should be preferred? Is it superclass constraints whenever possible?

CodePudding user response：

If you write instance Monoid a => Semigroup a you're essentially declaring Semigroup to be a synonym of Monoid. With that instance in place, all Monoid instances will be Semigroup instances – so far it's like the real superclass relation. But vice versa any constraint Semigroup a appearing anywhere in your code will immediately match the instance head, i.e. effectively you also have the relationship the other way around: the only way for a type to be an instance of Semigroup happens to be by first being an instance of Monoid, which is of course not what you wanted.

There's a way to circumvent this: you could still have overlapping instances that instantiate Semigroup without Monoid. Or rather, you could make it

instance {-# OVERLAPPABLE #-} Monoid a => Semigroup a

But I would advise against this: overlapping instance are IMO fundamentally violating the principle of how typeclasses work in Haskell, they regularly lead to weird surprises, and it's unnecessary because Haskell already has a good way of declaring this relationship: by making Semigroup a superclass of Monoid!

There is one caveat to this, and it admittedly a situation that turns up quite a lot: if you can't control class B, and you write your own class A, and it should be a superclass of B but you can't change B. In fact, exactly that has happened historically: before GHC-8, the Monoid class had no superclass, and Semigroup was defined in a separate package.

I don't really know a good solution for that situation. Best is if you can convince the maintainer of B to pull your A superclass into the package to have the proper A => B superclass relationship. Short of that, it can sometimes help to write a new class

class (A x, B x) => B' x
instance (A x, B x) => B' x

CodePudding user response：

This instance has a main downside:

instance Monoid a => Semigroup a where
        (<>) = mappend

First, the positive side: the instance does cause each monoid type to become a semigroup, as wanted.

However, it is not accepted unless we enable UndecidableInstances. It is not a huge downside, since this is a mostly benign extension, but it is something that has to be kept in mind.

The main issue, though, is that the instance prevents any other type from being a semigroup. More precisely, it prevents us to define any semigroup that is not a monoid.

This is because once we have an instance of the form

instance ... => Semigroup a where
   ...

then this will overlap with any other instance for Semigroup, triggering errors.

data T = T
instance Semigroup T where
   ...

Overlapping instances for Semigroup T
     Matching instances:
       instance Monoid a => Semigroup a
       instance Semigroup T

Effectively, this limits semigroups to being those type that are also monoids.

In theory, one might lift this restriction enabling the overlapping instances extension, but that is generally regarded as a bad idea. Once we do so, the instance selection rules become more complex and fragile. Now, if one is careful about which instances are in scope, e.g. avoiding orphan instances, one can live with overlapping instances. Still, they are best to be avoided when possible.

CodePudding user response：

In your example both definitions are equivalent since one can derive the <> implementation using mappend. There is a slight difference that may not be immediately apparent:

class Semigroup a => Monoid a where ...

Means that for a to be a Monoid it must first be a Semigroup

class Monoid a where ...
instance Monoid a => Semigroup a where ...

Means that if a is a Monoid then it's also a Semigroup and you state that by providing the instance implementation. In the first case one who wants to implement an instance of Monoid for his own data type will have to also implement a Semigroup instance:

newtype PlusInt = I Int 

class Semigroup a => Monoid a where ...

instance Monoid PlusInt where 
    mempty = I 0 
    mappend (I n) (I m) = I $ n   m
-- ^ Error: No instance for (Semigroup PlusInt)

So which one to use really depends on the case:

by definition a Monoid must be a Semigroup so the definition is class Semigroup a => Monoid a where ...
by definition an Applicative must be a Functor so the type class is class Functor f => Applicative f where ...
If a is a Semigroup then Maybe a is a Semigroup so you have an instance declaration like instance Semigroup a => Semigroup (Maybe a) where ...
...

CodePudding user response：

There is a fundamental error in how you're reading the instance constraint. It is important (and not obvious at first) to remember that constraints on instances are checked after committing to the instance.

To see whether an instance matches a given type you only look at the stuff to the right of the => (the instance head). If it matches then the compiler commits to the instance and the constraints on the instance become additional requirements for the typechecker to consider. If those fail you will have a type error; it's too late to consider another instance¹. The compiler will only consider other instances if the instance head does not match at all.

This means that instance Monoid a => Semigroup a does not mean "any type that is a Monoid is a Semigroup". Rather it means "every type a is a Semigroup, but to use any Semigroup methods with a requires you to prove Monoid a first".

As a consequence of these rules, there can be no other Semigroup instances (because instance ... => Semigroup a matches every type). Which means the implication of the instance you wrote is actually backwards from the original class constraint: class Semigroup a => Monoid a means "if a type is a Monoid then it is also a Semigroup", whereas instance Monoid a => Semigroup a effectively means "if a type is a Semigroup then it is also a Monoid".

That's all you really need to know. The answer to your question about why both features exist is trivially "they're not the same", and the answer to which should be preferred is pretty clear once you know what they both mean; since they mean different things you have to consider what you want to say and use the most appropriate feature on a case by case basis. The rest of this answer is trying to explain why the compiler works this way; I find it's more intuitive when you understand the problems the rules are trying to avoid.

The reason for these rules is basically to support separate compilation of modules while trying to retain coherence of the type class system (meaning that any time we resolve an instance for a given type we want to be able to assume it will be the exact same instance definition, consistently across an entire multi-package program, no matter how we found it; APIs like Data.Map fundamentally rely on this property). The compiler only processes one module at a time (and its imports, to a lesser degree), so it doesn't want to assume that it can see every single instance declaration that will be used in the final complete program. That means the compiler never wants to do anything that would only be correct if an instance does not exist just because it hasn't seen such an instance.

This is the guiding principle that explains everything: the compiler can only ever use positive information about instances (that a type does have an instance) to compile your program. It will only ever use negative information (that an instance appears to be missing) to reject your program, never to accept it.

So more concretely, if we have:

module Monoid where

class Monoid a where
  mappend :: a -> a -> a
  mempty :: a

class Semigroup a where
  (<>) :: a -> a -> a

instance Monoid a => Semigroup a where
  (<>) = mappend

and also

module DataDefinition where

data Example = Constructor

and finally

module DataUsage where

import DataDefinition
import Monoid

foo :: Example -> Example -> Example
foo x y = x <> y

The compiler will reject foo because of a missing Monoid instance for Example, not a missing Semigroup instance. It has to resolve the Semigroup Example constraint using the Monoid a => Semigroup a instance; because another module in the final program (importing both Monoid and DataDefinition, but not importing or imported by DataUsage) might theoretically have a Monoid Example instance and also use the Module a => Semigroup a instance for Example. If that happened then the only way to maintain the property that the entire program always uses the same Semigroup instance for T is that inside DataUsage we have to use the Monoid a => Semigroup a one as well. And since we can't rule out whether that happens in another module, we have to do that regardless of whether it actually does happen! But that then requires we have to look for a Monoid T instance, which we don't find, and we're allowed to use the non-existence of the Monoid instance to report an error.

If we wanted T to be a Semigroup only and add:

instance Semigroup T where
  C <> C = C

then we get an error about overlapping instances because instance Semigroup T and instance Monoid a => Semigroup a both match the usage in bar. The logic that we can't allow this is a bit more subtle, but it comes down to the same thing. Another module could conceivably be working with T without actually knowing it, via instantiation of a polymorphic type. That module wouldn't use the specific Semigroup T instance (because it's working with a polymorphic type, not T specifically). But it might use the Monoid a => Semigroup a instance, and if it was ever called on T (which would require there to be a Monoid T instance out there somewhere that isn't imported into the DataUsage module we're compiling right now) then it would use the general Semigroup instance with T, instead of using the T-specific one. So since we can't rule that out we can't use the specific one either! Since that reasoning would always apply when you have both a specific and a general instance that can match the same type, overlapping instances are simply reported as an error in themselves (although I believe only when trying to solve a related constraint; you might technically be able to include overlapping instances if you never attempt to use them with any types they both match).

There are compiler extensions in GHC that allow you to relax these rules, permitting multiple overlapping instances to be used by choosing the "best match" at each use site rather than requiring that there only ever be a single possible match. I'll leave it to the appropriate section in the GHC users guide to explain all the gory details.

This can be a powerful feature, but it can also be a bit fragile; if GHC picks a different instance from the available ones that match than you thought it would that can often cause subtle bugs that are difficult to diagnose. In my experience overlapping is more often used for special-purpose classes where a single library provides all of the instances that are expected to exist, rather than with general-purpose things like Semigroup that are expected to be widely instantiated across many packages.

I don't recommend you try using overlapping instances until you're really very familiar and comfortable with the normal instance resolution rules, and know that you're trying to do something that cannot be modelled well without overlapping instances. Especially I don't recommend seeing overlapping instances as a way to write "universal" instances like instance Monoid a => Semigroup a, just to avoid writing easy instances for each type. It looks tempting, but I don't think it's worth it.

¹ Actually if there are other instances whose heads matched that in itself would be detected as an error!