It kind of seems like in Haskell one can use instance declarations similarly to superclass constraints. I'll give an example using Semigroup
and Monoid
from Prelude. Here's a simplified definition of Semigroup
:
class Semigroup a where
(<>) :: a -> a -> a
Here's a simplified version of Prelude's definition of Monoid
which uses a superclass constraint:
class Semigroup a => Monoid a where
mempty :: a
mappend :: a -> a -> a
mappend = (<>)
If I'm not mistaken it seems the superclass constraint could be replaced by an instance declaration as follows:
class Monoid a where
mempty :: a
mappend :: a -> a -> a
instance Monoid a => Semigroup a where
(<>) = mappend
Is there a reason for Haskell to have superclass constraints when it seems one can do the same with instance declarations (with constraints)? Which one should be preferred? Is it superclass constraints whenever possible?
CodePudding user response:
If you write instance Monoid a => Semigroup a
you're essentially declaring Semigroup
to be a synonym of Monoid
. With that instance in place, all Monoid
instances will be Semigroup
instances – so far it's like the real superclass relation. But vice versa any constraint Semigroup a
appearing anywhere in your code will immediately match the instance head, i.e. effectively you also have the relationship the other way around: the only way for a type to be an instance of Semigroup
happens to be by first being an instance of Monoid
, which is of course not what you wanted.
There's a way to circumvent this: you could still have overlapping instances that instantiate Semigroup
without Monoid
. Or rather, you could make it
instance {-# OVERLAPPABLE #-} Monoid a => Semigroup a
But I would advise against this: overlapping instance are IMO fundamentally violating the principle of how typeclasses work in Haskell, they regularly lead to weird surprises, and it's unnecessary because Haskell already has a good way of declaring this relationship: by making Semigroup
a superclass of Monoid
!
There is one caveat to this, and it admittedly a situation that turns up quite a lot: if you can't control class B
, and you write your own class A
, and it should be a superclass of B
but you can't change B
. In fact, exactly that has happened historically: before GHC-8, the Monoid
class had no superclass, and Semigroup
was defined in a separate package.
I don't really know a good solution for that situation. Best is if you can convince the maintainer of B
to pull your A
superclass into the package to have the proper A => B
superclass relationship. Short of that, it can sometimes help to write a new class
class (A x, B x) => B' x
instance (A x, B x) => B' x
CodePudding user response:
This instance has a main downside:
instance Monoid a => Semigroup a where
(<>) = mappend
First, the positive side: the instance does cause each monoid type to become a semigroup, as wanted.
However, it is not accepted unless we enable UndecidableInstances
. It is not a huge downside, since this is a mostly benign extension, but it is something that has to be kept in mind.
The main issue, though, is that the instance prevents any other type from being a semigroup. More precisely, it prevents us to define any semigroup that is not a monoid.
This is because once we have an instance of the form
instance ... => Semigroup a where
...
then this will overlap with any other instance for Semigroup
, triggering errors.
data T = T
instance Semigroup T where
...
Overlapping instances for Semigroup T
Matching instances:
instance Monoid a => Semigroup a
instance Semigroup T
Effectively, this limits semigroups to being those type that are also monoids.
In theory, one might lift this restriction enabling the overlapping instances extension, but that is generally regarded as a bad idea. Once we do so, the instance selection rules become more complex and fragile. Now, if one is careful about which instances are in scope, e.g. avoiding orphan instances, one can live with overlapping instances. Still, they are best to be avoided when possible.
CodePudding user response:
In your example both definitions are equivalent since one can derive the <>
implementation using mappend
. There is a slight difference that may not be immediately apparent:
class Semigroup a => Monoid a where ...
Means that for a
to be a Monoid
it must first be a Semigroup
class Monoid a where ...
instance Monoid a => Semigroup a where ...
Means that if a
is a Monoid
then it's also a Semigroup
and you state that by providing the instance implementation.
In the first case one who wants to implement an instance of Monoid
for his own data type will have to also implement a Semigroup
instance:
newtype PlusInt = I Int
class Semigroup a => Monoid a where ...
instance Monoid PlusInt where
mempty = I 0
mappend (I n) (I m) = I $ n m
-- ^ Error: No instance for (Semigroup PlusInt)
So which one to use really depends on the case:
- by definition a
Monoid
must be aSemigroup
so the definition isclass Semigroup a => Monoid a where ...
- by definition an
Applicative
must be aFunctor
so the type class isclass Functor f => Applicative f where ...
- If
a
is aSemigroup
thenMaybe a
is aSemigroup
so you have an instance declaration likeinstance Semigroup a => Semigroup (Maybe a) where ...
- ...
CodePudding user response:
There is a fundamental error in how you're reading the instance constraint. It is important (and not obvious at first) to remember that constraints on instances are checked after committing to the instance.
To see whether an instance matches a given type you only look at the stuff to the right of the =>
(the instance head). If it matches then the compiler commits to the instance and the constraints on the instance become additional requirements for the typechecker to consider. If those fail you will have a type error; it's too late to consider another instance1. The compiler will only consider other instances if the instance head does not match at all.
This means that instance Monoid a => Semigroup a
does not mean "any type that is a Monoid
is a Semigroup
". Rather it means "every type a
is a Semigroup
, but to use any Semigroup
methods with a
requires you to prove Monoid a
first".
As a consequence of these rules, there can be no other Semigroup
instances (because instance ... => Semigroup a
matches every type). Which means the implication of the instance you wrote is actually backwards from the original class constraint: class Semigroup a => Monoid a
means "if a type is a Monoid
then it is also a Semigroup
", whereas instance Monoid a => Semigroup a
effectively means "if a type is a Semigroup
then it is also a Monoid
".
That's all you really need to know. The answer to your question about why both features exist is trivially "they're not the same", and the answer to which should be preferred is pretty clear once you know what they both mean; since they mean different things you have to consider what you want to say and use the most appropriate feature on a case by case basis. The rest of this answer is trying to explain why the compiler works this way; I find it's more intuitive when you understand the problems the rules are trying to avoid.
The reason for these rules is basically to support separate compilation of modules while trying to retain coherence of the type class system (meaning that any time we resolve an instance for a given type we want to be able to assume it will be the exact same instance definition, consistently across an entire multi-package program, no matter how we found it; APIs like Data.Map
fundamentally rely on this property). The compiler only processes one module at a time (and its imports, to a lesser degree), so it doesn't want to assume that it can see every single instance declaration that will be used in the final complete program. That means the compiler never wants to do anything that would only be correct if an instance does not exist just because it hasn't seen such an instance.
This is the guiding principle that explains everything: the compiler can only ever use positive information about instances (that a type does have an instance) to compile your program. It will only ever use negative information (that an instance appears to be missing) to reject your program, never to accept it.
So more concretely, if we have:
module Monoid where
class Monoid a where
mappend :: a -> a -> a
mempty :: a
class Semigroup a where
(<>) :: a -> a -> a
instance Monoid a => Semigroup a where
(<>) = mappend
and also
module DataDefinition where
data Example = Constructor
and finally
module DataUsage where
import DataDefinition
import Monoid
foo :: Example -> Example -> Example
foo x y = x <> y
The compiler will reject foo
because of a missing Monoid
instance for Example
, not a missing Semigroup
instance. It has to resolve the Semigroup Example
constraint using the Monoid a => Semigroup a
instance; because another module in the final program (importing both Monoid
and DataDefinition
, but not importing or imported by DataUsage
) might theoretically have a Monoid Example
instance and also use the Module a => Semigroup a
instance for Example
. If that happened then the only way to maintain the property that the entire program always uses the same Semigroup
instance for T
is that inside DataUsage
we have to use the Monoid a => Semigroup a
one as well. And since we can't rule out whether that happens in another module, we have to do that regardless of whether it actually does happen! But that then requires we have to look for a Monoid T
instance, which we don't find, and we're allowed to use the non-existence of the Monoid
instance to report an error.
If we wanted T
to be a Semigroup
only and add:
instance Semigroup T where
C <> C = C
then we get an error about overlapping instances because instance Semigroup T
and instance Monoid a => Semigroup a
both match the usage in bar
. The logic that we can't allow this is a bit more subtle, but it comes down to the same thing. Another module could conceivably be working with T
without actually knowing it, via instantiation of a polymorphic type. That module wouldn't use the specific Semigroup T
instance (because it's working with a polymorphic type, not T
specifically). But it might use the Monoid a => Semigroup a
instance, and if it was ever called on T
(which would require there to be a Monoid T
instance out there somewhere that isn't imported into the DataUsage
module we're compiling right now) then it would use the general Semigroup
instance with T
, instead of using the T
-specific one. So since we can't rule that out we can't use the specific one either! Since that reasoning would always apply when you have both a specific and a general instance that can match the same type, overlapping instances are simply reported as an error in themselves (although I believe only when trying to solve a related constraint; you might technically be able to include overlapping instances if you never attempt to use them with any types they both match).
There are compiler extensions in GHC that allow you to relax these rules, permitting multiple overlapping instances to be used by choosing the "best match" at each use site rather than requiring that there only ever be a single possible match. I'll leave it to the appropriate section in the GHC users guide to explain all the gory details.
This can be a powerful feature, but it can also be a bit fragile; if GHC picks a different instance from the available ones that match than you thought it would that can often cause subtle bugs that are difficult to diagnose. In my experience overlapping is more often used for special-purpose classes where a single library provides all of the instances that are expected to exist, rather than with general-purpose things like Semigroup
that are expected to be widely instantiated across many packages.
I don't recommend you try using overlapping instances until you're really very familiar and comfortable with the normal instance resolution rules, and know that you're trying to do something that cannot be modelled well without overlapping instances. Especially I don't recommend seeing overlapping instances as a way to write "universal" instances like instance Monoid a => Semigroup a
, just to avoid writing easy instances for each type. It looks tempting, but I don't think it's worth it.
1 Actually if there are other instances whose heads matched that in itself would be detected as an error!