Home > OS >  When are the results of default methods in instances cached?
When are the results of default methods in instances cached?

Time:11-25

Consider the following module:

{-# LANGUAGE GeneralisedNewtypeDeriving #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE DefaultSignatures #-}

module Lib where

import Data.List (foldl')

doBigSum :: (Enum a, Num a) => a
doBigSum = foldl' ( ) 0 [1..200000000]

f :: (Enum a, Num a) => a -> a
f x = x   doBigSum

class BigSum a where
  bigSum :: a
  default bigSum :: (Enum a, Num a) => a
  bigSum = doBigSum

newtype A = A Integer deriving newtype (Enum, Num, Show)
newtype B = B Integer deriving newtype (Enum, Num, Show)

instance BigSum A where
  bigSum = doBigSum

instance BigSum B

g :: (Num a, BigSum a) => a -> a
g x = x   bigSum

Lets assume we're using GHC here also.

There's somethings I'd note here (which I believe to be true, correct me if I'm wrong):

  1. Unless there's some fancy optimisation/inlining, there's a good chance that doBigSum will NOT be cached and instead recomputed for each reference, because doBigSum actually takes a hidden argument which is the typeclass dictionary for the type a it's being instanced on.
  2. However, in the instance definition BigSum A, bigSum will be cached, and every subsequent reference will use that value.

Indeed, this is what I see, if I create a main function like this:

import Lib

main :: IO ()
main = do
  print "Start"
  print ((f 1) :: A)
  print ((f 2) :: A)

And compile without optimisation (the separate modules are important here), there is clearly a gap in time between the output of the two print statements.

But if I do:

import Lib

main :: IO ()
main = do
  print "Start"
  print ((g 1) :: A)
  print ((g 2) :: A)

Then the result of g 2 is printed immediately after the result of g 1. Clearly the instance definition for BigSum A results in a separate constant for bigSum :: A being created.

Now consider

import Lib

main :: IO ()
main = do
  print "Start"
  print ((g 1) :: B)
  print ((g 2) :: B)

Note the instance definition for BigSum B is not explicit, it depends on the default.

Now what happens here? Is it:

  1. One implementation of bigSum, namely the default, which has a hidden parameter of the type, much like doBigSum does, so the result is not cached OR
  2. There is a separate implementation of bigSum for each instance of BigSum, which is specialised for the type in question, so that when calling bigSum for a particular type it is only calculated once for that type.

My tests imply that what happens is case (2), which is good for my use case, but I was wondering how much I can rely on this.

My actual use case is more like the following:

data ConversionInfo a = ...

data Conversions a = Conversions { convA :: a -> A, convB :: a -> B, convC :: a -> C } 

f :: ConversionInfo a -> Conversions a
f = ... -- Lots of work happens here

class SimpleConversion a where
  conversionInfo :: ConversionInfo a
  conversions :: Conversions a
  conversions = f conversionInfo

class Conversions a where
  conversionA :: a -> A
  default conversionA :: SimpleConversion a => a -> A
  conversionA = convA conversions

  conversionB :: a -> B
  default conversionB :: SimpleConversion a => a -> B
  conversionB = convB conversions

  conversionC :: a -> C
  default conversionC :: SimpleConversion a => a -> C
  conversionC = convC conversions

What I want to be reliably sure of is that f is not going to be recomputed every time I call conversionX blah for some X and blah. Instead I want f to be only ran once per type of SimpleConversion. Anything else will completely blow up the runtime cost, because f does a lot of work as compared to the actual conversions.

Any documentation/references on this would be appreciated.

CodePudding user response:

Short Answer: It's not important whether a method is defined by means of a default class method or a per-instance definition. What matters is that the method's type be monomorphic within a particular instance definition. This means that the type of the method must be monomorphic once the instance type variables are resolved (e.g., bigSum :: A), and that the instance itself be monomorphic (e.g., instance BigSum A, but not instance BigSum [a]). In this case, you are guaranteed that only one bigSum value per instance will be evaluated.

Long Answer: As you are undoubtedly aware, early in the compilation process, classes are desugared into dictionary data types, instances are desugared into values of those data types, and constraints are converted into explicit dictionary arguments. Default methods are simply desugared into polymorphic functions that will be used where necessary to create the method fields for unspecific methods in declared instances. In other words, your example is desugared into more or less the following:

-- dictionary-passing versions of `doBigSum` and `f`
doBigSum :: Enum a -> Num a -> a
doBigSum dEnum dNum = ...

f :: Enum a -> Num a -> a -> a
f dEnum dNum x = x   doBigSum dEnum dNum

-- the BigSum class
data BigSum a = BigSum { bigSum :: a }

-- default class method for bigSum
dm_bigSum :: BigSum a -> Enum a -> Num a -> a
dm_bigSum _dBigSum dEnum dNum = doBigSum dEnum dNum

with the derived dictionaries and instances for A desugared into:

newtype A = A Int
fEnumA = coerce fEnumInt   -- `Enum A` dictionary
fNumA = coerce fNumInt     -- `Num A` dictionary

fBigSumA :: BigSum A
fBigSumA = BigSum { bigSum = bigSumA }

bigSumA :: A
bigSumA = doBigSum fEnumA fNumA

and the derived dictionaries and instances for B desugared into:

newtype B = B Int
fEnumB = coerce fEnumInt
fNumB = coerce fNumInt

fBigSumB :: BigSum B
fBigSumB = BigSum { bigSum = bigSumB }

bigSumB :: B
bigSumB = dm_bigSum fBigSumB fEnumB fNumB

and g desugared into:

g :: Num a -> BigSum a -> a -> a
g dNum dBigSum x = ( ) dNum x (bigSum dBigSum)

Note that both A (which does not use the default method) and B (which does) have a bigSum field that's defined by means of a polymorphic function (doBigSum directly for A; doBigSum by way of the polymorphic function dm_bigSum for B). What matters is that both bigSum field values -- namely bigSumA and bigSumB -- are named, monomorphic values.

In calls to f, the expression doBigSum dEnum dNum gets re-evaluated every time f is called. However, in calls to g, the field selection expression bigSum dBigSum just selects the monomorphic field from the dictionary. If you call it a dozen times at type A, it just returns the same bigSumA :: A thunk.

So, you are guaranteed that bigSum will be evaluated once per instance for this example.

However, this guarantee relies on two things. First, it relies on the method being monomorphic after the instance parameters are resolved. If you had:

class BigPair a where
  bigPair :: (Num b) => (a,b)
  default bigPair :: (Enum a, Num a, Num b) => (a, b)
  bigPair = (doBigSum, 123456789)

instance BigPair A where
  bigPair = (doBigSum, 123456789)
instance BigPair B

g :: (Num a, BigPair a, Num b) => a -> (a, b)
g x = let (a, b) = bigPair in (a x, b)

you might find that an expression like g 1 :: (A, Int) causes re-evaluation of doBigSum every time it's run. Same with g 1 :: (B, Int). The issue is that the field in the instance dictionary for A is of type forall b. Num b => (A, b), so it's not monomorphic. The compiler might lift the evaluation of doBigSum out of the field expression, but you're relying on an optimization now. In my testing, running under GHCi or compiled with -O0, doBigSum was re-evaluated every time. Compiled with -O2, it was only evaluated once.

Second, the it relies on the instances themselves being monomorphic. You can easily write a polymorphic instance that can result in bigSum being re-evaluated every time it's called. Consider:

instance (Enum a, Num a) => BigSum [a] where
  bigSum = [doBigSum]

This desugars into a function that generates a family of instance dictionaries:

fBigSumList :: Enum a -> Num a -> BigSum [a]
fBigSumList dEnum dNum = BigSum { bigSum = bigSumList dEnum dNum }

bigSumList :: Enum a -> Num a -> a
bigSumList dEnum dNum = [doBigSum dEnum dNum]

Now, there is no longer any named, monomorphic field value for any particular type a. The expression:

bigSum :: [Int]

desugars into the call:

bigSum (fBigSumList fEnumInt fNumInt) 
= bigSumList fEnumInt fNumInt
= [doBigSum dEnum dNum]

which can cause re-evaluation of doBigSum.

In my testing of a simple example, GHCi re-evaluated doBigSum every time bigSum :: [Int] was evaluated. However, compiling even with -O0 only evaluated it once, but probably because of common subexpression elimination in the main function, so this is a potentially very fragile optimization.

In your Conversions example, the field Conversions a should be monomorphic once the type variable a is resolved, so I believe that any monomorphic instances you define should share a single evaluated copy of conversions. (One caveat: since the fields are functions, you are relying on the "work" that f is doing being appropriately lifted out of the collection of functions that f is returning.) If you change the definition of the Conversions data type so it contains some higher-rank polymorphism, or if you define polymorphic SimpleConversion instances, you may lose this guarantee.

You can examine the precise desugaring by compiling with something like:

ghc -O0 -ddump-ds -dsuppress-all -dno-suppress-type-applications 
        -dno-suppress-type-signatures -dsuppress-uniques -fforce-recomp 
        ...

Be warned that classes with a single method are desugared into newtypes, so the instance dictionaries end up being coercions of the single field, rather than data types with multiple, named fields. For example, your code actually desugars into something like this:

-- the polymorphic function `doBigSum`
doBigSum :: forall a. (Enum a, Num a) => a
doBigSum = \ @a $dEnum $dNum -> ...

-- the polymorphic default method
$dmbigSum :: forall a. (BigSum a, Enum a, Num a) => a
$dmbigSum = \ @a _ $dEnum $dNum -> doBigSum @a $dEnum $dNum

-- the instance dictionary for A
-- (note: because it's only got one field, it's a cast *of* that field)
$fBigSumA :: BigSum A
$fBigSumA = $cbigSum_A `cast` <Co:3>

-- bigSum as defined in the A instance
$cbigSum_A :: A
$cbigSum_A = doBigSum @A $fEnumA $fNumA

-- the instance dictionary for B
$fBigSumB :: BigSum B
$fBigSumB = $cbigSum_B `cast` <Co:3>

-- bigSum from the default method
$cbigSum_B :: B
$cbigSum_B = $dmbigSum @B $fBigSumB $fENumB $fNumB
  • Related