Home > Software engineering >  Selecting a type-class instance based on the context (once again)
Selecting a type-class instance based on the context (once again)

Time:11-18

Ok this topic had been discussed a lot of times, but since Haskell evolves lets consider it again to see what we can do in contemporary Haskell (by contemporary I mean GHC 9.0 - 9.2).

Fist let me state the problem by an example. Suppose we have a function which determines a number of bytes required to store a value of a given type. We can have two instances of this function: one for fixed sized data types and other for variable sized. For example Int32 is fixed sized and always takes 4 bytes to store regardless it's value. But data C = A Int32 | B Int32 Int32 can be considered variable sized since it may take 4 bytes to store in case of A constructor or 8 bytes in case of B constructor. It's natural to have two classes for this:

  1. A class for fixed sized values. Note that value itself not required, we can use Proxy as a parameter to determine the size.
class FixedSize a where 
  fixedSize :: p a -> Int
  1. A class for variable sized values. The function takes a value to determine the size.
class VariableSize a where 
  variableSize :: a -> Int

Now lets say we want to define a function which determines the size of a list of values. The values in the list can be either fixed or variable sized. So it's natural to have two functions:

  1. One for a list of fixed sized values.
listSize :: (FixedSize a) => [a] -> Int
listSize _ = (* fixedSize (Proxy @a) ) . length
  1. Other for a list of variable sized values.
listSize :: (VariableSize a) => [a] -> Int
listSize = sum . map variableSize

However it is not possible to use a naive approach, the following basically won't compile:

class Size a where 
  size :: a -> Int

instance (FixedSize a) => Size [a] where
  size = _ = (* fixedSize (Proxy @a) ) . length

instance (VariableSize a) => Size [a] where
  size = sum . map variableSize

This happens because Haskell relies on type when selecting an instance, but not on the context. There are trick to overcome this limitation described here: https://wiki.haskell.org/GHC/AdvancedOverlap. The basic idea is to define type-level predicate which reflects the context and use it to select an instance using multi-parameter type classes an overlapping instances. In this case Haskell will be able to select more specific instance based on the type parameters. By "more specific" I mean matching type-level predicates.

The proposed approaches can be divided into three groups conditionally.

  1. Use closed type families to define a type-level predicate ("Solution 3" according to the wiki-page). This is not usable approach because it will disallow user to define instances for custom data types. I won't discuss it further.

  2. Define the predicate as a separate type class, define default (fallback) overlappable instance for the predicate ("Solution 1" according to the wiki-page). This is working approach, but it requres from user to maintain additional instances for the predicate.

  3. Use open type families ("Solution 2"). I'd like to discuss slightly modified version of this approach.

class Size a where 
  size :: a -> Int

class FixedSize a where
  type FixedSized a :: Bool
  type FixedSized a = 'True
  fixedSize :: p a -> Int

#include "MachDeps.h"
instance FixedSize Int where
  fixedSize _ = SIZEOF_HSINT

class ListSize (isElemFixed :: Bool) a where
  listSize :: p isElemFixed -> a -> Int

instance (ListSize (FixedSized a) [a]) => Size [a] where
  size = listSize $ Proxy @(FixedSized a)

instance (FixedSize a) => ListSize 'True [a] where
  listSize _ = trace "elem size is fiхed" . (* fixedSize (Proxy @a) ) . length

instance {-# INCOHERENT #-} (Size a) => ListSize any [a] where
  listSize _ = trace "elem size is variable" . sum . map size

test1 = size [1::Int,2,3]

test2 = size [[1::Int], [2,3,4]]

This approach seems the most convenient user-wise to me. The separate type-level predicate facility is still required and user can still mess up by defining something like this explicitly:

class FixedSize UserType where
  type FixedSized UserType = 'False

but it just works as expected when using defaults.

However, it reqires incoherent instances. And I'm scare of incoherent instances. Because the Hasllkel documentation leterelly says that in case of incoherent instances the compiler is free to choose any instance it wants which looks unpredictable. Now I'll probably doing a bad thing by asking 4 questions in one post but they all related:

  1. Why incoherent instances are needed here exactly? Does not ListSize 'True [a] just overlap with ListSize any [a] and could be picked when first paramenter evaluates to True?

  2. Is there a way to break this code? I mean, to make a complier to choose ListSize any [a] (variable sized elem code) when FixedSize a is in scope?

  3. Are these instances really incoherent? Probably compiler just can't prove coherence, so how it can be proven manually?

  4. Is there a completely different approach to solve this problem in modern Haskell? By the problem I mean a partial exapmle above, selecting an appropriate function to determine the size of a list of values based on the type of the values in compile time.

CodePudding user response:

This sounds like a use for if-instance.

It allows you to branch on whether a constraint is satisfied, obviously this is not safe since it violates the open-world assumption (see reddit post).

If you claim that your classes are mutually exclusive, then you should be able to safely write

{-# OPTIONS_GHC -fplugin=IfSat.Plugin #-}

import Data.Constraint.If ( type (||) (dispatch) )

instance FixedSize a || VariableSize a => Size [a] where
  size :: [a] -> Int
  size as = dispatch fixed variable where

    fixed :: FixedSize a => Int
    fixed = fixedSize (Proxy @a) * length as

    variable :: VariableSize a => Int
    variable = sum (map variableSize as)

CodePudding user response:

I'll focus on the specific example. Hopefully some of the solutions proposed below will also be applicable to the general case you have in mind.

There is a general theme, though, which is: we avoid having two different classes to choose from.

The plain, basic approach

If the size can be either fixed or variable, we can use a sum type to represent those two cases.

data Scale a = Fixed Int | Variable (a -> Int)

class Size a where
   scale :: Scale a

instance Size Int where
   scale = Fixed 4

instance Size a => Size [a] where
   scale = case scale @a of
      Fixed n    -> Variable (\xs -> n * length xs)
      Variable s -> Variable (\xs -> sum $ map s xs)

size :: forall a. Size a => a -> Int
size x = case scale @a of
   Fixed n    -> n
   Variable s -> s x

Exploit GADTs

We start by enumerating the two cases:

data Sized = Fixed | Variable

-- associated singleton
data SSized (s :: Sized) where
   SFixed    :: SSized 'Fixed
   SVariable :: SSized 'Variable

Then we use a type family to drive the two different types.

type family Scale s a where
   Scale Fixed    _ = Int
   Scale Variable a = a -> Int

class Size a where
   type F a :: Sized
   which :: SSized (F a)
   scale :: Scale (F a) a

Note how the class also provides the singleton. We can then instantiate the class.

instance Size Int where
   type F Int = Fixed
   which = SFixed
   scale = 4

instance Size a => Size [a] where
   type F [a] = Variable
   which = SVariable
   scale xs = case which @a of
      SFixed    -> scale @a * length xs
      SVariable -> sum $ map (scale @a) xs

size :: forall a. Size a => a -> Int
size x = case which @a of
   SFixed    -> scale @a
   SVariable -> scale @a x

Essentially, we collapsed two classes in one: the classes for fixed-sized and variably-sized types are now just one. At the type level, we can distinguish between the two cases using the type family F a, while at the term level we can we can distinguish between the two cases using the singleton which @a.

CodePudding user response:

One possibility would be to provide an additional class for the size of containers, together with newtypes to be used with deriving. Users will still have to write an instance for this new class, but it will be a single, short line of code. Like this:

class ContainerSize a where
    containerSize :: Foldable f => f a -> Int

newtype FixedElement a = FixedElement a
instance FixedSize a => ContainerSize (FixedElement a) where
    containerSize c = fixedSize @a * length c

newtype VariableElement a = VariableElement { getVE :: a }
instance VariableSize a => ContainerSize (VariableElement a) where
    containerSize c = sum (variableSize . getVE <$> c)

instance ContainerSize a => VariableSize [a] where
    variableSize = containerSize

Now, when users would previously have written something like

instance FixedSize X where ...
instance VariableSize Y where ...

they now add the lines

deriving instance ContainerSize X via FixedElement X
deriving instance ContainerSize Y via VariableElement Y

(If this feels a bit like a special-case hack for lists, let me encourage you that this is very similar to something in the standard libraries that has worked well for several decades, showList, so that's some evidence that it can cover quite a lot of ground after all!)

  • Related