How to understand Haskell typeclass definitions with "| m -> s" and the like-CodePudding

I can't seem to find books really explaining what the | m -> s means in the MonadState type class definition:

class Monad m => MonadState s m | m -> s where
  {- other stuff here -}

The book I'm currently reading explains

The s type variable represents a type of state. It is uniquely determined by the monad m as specified by the | m -> s part.

but I don't understand what that exactly means.

The extent to which I understand is that m has to be some type with a Monad instance, and m will have a value that it contains. Does it mean that s is what would appear as m s if the provided monad were to show up in a type expression--i.e. something like return :: Monad m => s -> m s?

CodePudding user response：

It's called a functional dependency. All it does is put some restrictions on the instances you're allowed to define for that class.

It means the variables to the left of the arrow (m in this case) must determine the ones to the right of the arrow (s in this case).¹ You are not allowed to define multiple instances such that you are unable to tell what s should be immediately upon making a choice for m.

For example, this is illegal:

data MyStateMonad = ...

instance MonadState MyStateMonad Int

instance MonadState MyStateMonad Bool

With the above instances, once m is determined to be MyStateMonad we can't immediately deduce what s must be; there are instances for both Int and Bool. That would be perfectly fine in a normal multi-parameter type class, but the functional dependency m -> s forbids it.

The main thing it achieves (why the authors wanted to impose that restriction) is that GHC is aware of the restriction, and can use it during type inference. It would be very common to have a function that takes a monadic value with a MonadState constraint (m a for some a), but only interacts with the s state type internally. If GHC had to consider the possibility that there could be many MonadState instances for each m, with varying s, then code like this would frequently be ambiguous; the programmer would have to add in more type annotations to get it to compile.

But that's usually not a possible situation anyway, since a lot of MonadState instances are ultimately built on StateT², which looks like this:

newtype StateT s m a = StateT { runStateT :: s -> m (a,s) }

This already has a type parameter for the state! We're almost never going to want a type M with a StateT Int _ _ inside it to have a MonadState M Bool instance; we've already determined that the state should be Int by one of the parameters to the StateT we use. The only way it could really make sense is if M had both a StateT Int _ _ inside it and a StateT Bool _ _; theoretically we could then have wanted two MonadState instances for M with the types controlling which one is used.³

But instead the authors decided to use a functional dependency to sacrifice the ability to write such things, and instead better support the use case where there is only a single state type that makes sense for any given state monad. Now as soon as your type signatures have pinned down the monad you're using, the compiler can use this to identify the MonadState instance which then tells it what s must be. This effectively adds type information to every use of MonadState methods without requiring the programmer to write additional type annotations.

¹ Or phrased the other way around, the ones to the right of the arrow depend upon the ones on the left - in exactly the same way that a function's result depends upon its arguments (hence the term functional dependency).

² By "ultimately built on" I don't necessarily mean that the MonadState instance is literally for some combiatino of StateT's parameters. Just that somewhere inside the first parameter of the instance will often be a StateT, with MonadState instances lifting the "state" behaviour out to the top.

³ I'm not actually sure whether you could even get that to pass the monad laws.