Consider the following module:
{-# LANGUAGE GeneralisedNewtypeDeriving #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE DefaultSignatures #-}
module Lib where
import Data.List (foldl')
doBigSum :: (Enum a, Num a) => a
doBigSum = foldl' ( ) 0 [1..200000000]
f :: (Enum a, Num a) => a -> a
f x = x doBigSum
class BigSum a where
bigSum :: a
default bigSum :: (Enum a, Num a) => a
bigSum = doBigSum
newtype A = A Integer deriving newtype (Enum, Num, Show)
newtype B = B Integer deriving newtype (Enum, Num, Show)
instance BigSum A where
bigSum = doBigSum
instance BigSum B
g :: (Num a, BigSum a) => a -> a
g x = x bigSum
Lets assume we're using GHC here also.
There's somethings I'd note here (which I believe to be true, correct me if I'm wrong):
- Unless there's some fancy optimisation/inlining, there's a good chance that
doBigSum
will NOT be cached and instead recomputed for each reference, becausedoBigSum
actually takes a hidden argument which is the typeclass dictionary for the typea
it's being instanced on. - However, in the instance definition
BigSum A
,bigSum
will be cached, and every subsequent reference will use that value.
Indeed, this is what I see, if I create a main function like this:
import Lib
main :: IO ()
main = do
print "Start"
print ((f 1) :: A)
print ((f 2) :: A)
And compile without optimisation (the separate modules are important here), there is clearly a gap in time between the output of the two print statements.
But if I do:
import Lib
main :: IO ()
main = do
print "Start"
print ((g 1) :: A)
print ((g 2) :: A)
Then the result of g 2
is printed immediately after the result of g 1
. Clearly the instance definition for BigSum A
results in a separate constant for bigSum :: A
being created.
Now consider
import Lib
main :: IO ()
main = do
print "Start"
print ((g 1) :: B)
print ((g 2) :: B)
Note the instance definition for BigSum B
is not explicit, it depends on the default.
Now what happens here? Is it:
- One implementation of
bigSum
, namely the default, which has a hidden parameter of the type, much likedoBigSum
does, so the result is not cached OR - There is a separate implementation of
bigSum
for each instance ofBigSum
, which is specialised for the type in question, so that when callingbigSum
for a particular type it is only calculated once for that type.
My tests imply that what happens is case (2), which is good for my use case, but I was wondering how much I can rely on this.
My actual use case is more like the following:
data ConversionInfo a = ...
data Conversions a = Conversions { convA :: a -> A, convB :: a -> B, convC :: a -> C }
f :: ConversionInfo a -> Conversions a
f = ... -- Lots of work happens here
class SimpleConversion a where
conversionInfo :: ConversionInfo a
conversions :: Conversions a
conversions = f conversionInfo
class Conversions a where
conversionA :: a -> A
default conversionA :: SimpleConversion a => a -> A
conversionA = convA conversions
conversionB :: a -> B
default conversionB :: SimpleConversion a => a -> B
conversionB = convB conversions
conversionC :: a -> C
default conversionC :: SimpleConversion a => a -> C
conversionC = convC conversions
What I want to be reliably sure of is that f
is not going to be recomputed every time I call conversionX blah
for some X
and blah
. Instead I want f
to be only ran once per type of SimpleConversion
. Anything else will completely blow up the runtime cost, because f
does a lot of work as compared to the actual conversions.
Any documentation/references on this would be appreciated.
CodePudding user response:
Short Answer: It's not important whether a method is defined by means of a default class method or a per-instance definition. What matters is that the method's type be monomorphic within a particular instance definition. This means that the type of the method must be monomorphic once the instance type variables are resolved (e.g., bigSum :: A
), and that the instance itself be monomorphic (e.g., instance BigSum A
, but not instance BigSum [a]
). In this case, you are guaranteed that only one bigSum
value per instance will be evaluated.
Long Answer: As you are undoubtedly aware, early in the compilation process, classes are desugared into dictionary data types, instances are desugared into values of those data types, and constraints are converted into explicit dictionary arguments. Default methods are simply desugared into polymorphic functions that will be used where necessary to create the method fields for unspecific methods in declared instances. In other words, your example is desugared into more or less the following:
-- dictionary-passing versions of `doBigSum` and `f`
doBigSum :: Enum a -> Num a -> a
doBigSum dEnum dNum = ...
f :: Enum a -> Num a -> a -> a
f dEnum dNum x = x doBigSum dEnum dNum
-- the BigSum class
data BigSum a = BigSum { bigSum :: a }
-- default class method for bigSum
dm_bigSum :: BigSum a -> Enum a -> Num a -> a
dm_bigSum _dBigSum dEnum dNum = doBigSum dEnum dNum
with the derived dictionaries and instances for A
desugared into:
newtype A = A Int
fEnumA = coerce fEnumInt -- `Enum A` dictionary
fNumA = coerce fNumInt -- `Num A` dictionary
fBigSumA :: BigSum A
fBigSumA = BigSum { bigSum = bigSumA }
bigSumA :: A
bigSumA = doBigSum fEnumA fNumA
and the derived dictionaries and instances for B
desugared into:
newtype B = B Int
fEnumB = coerce fEnumInt
fNumB = coerce fNumInt
fBigSumB :: BigSum B
fBigSumB = BigSum { bigSum = bigSumB }
bigSumB :: B
bigSumB = dm_bigSum fBigSumB fEnumB fNumB
and g
desugared into:
g :: Num a -> BigSum a -> a -> a
g dNum dBigSum x = ( ) dNum x (bigSum dBigSum)
Note that both A
(which does not use the default method) and B
(which does) have a bigSum
field that's defined by means of a polymorphic function (doBigSum
directly for A
; doBigSum
by way of the polymorphic function dm_bigSum
for B
). What matters is that both bigSum
field values -- namely bigSumA
and bigSumB
-- are named, monomorphic values.
In calls to f
, the expression doBigSum dEnum dNum
gets re-evaluated every time f
is called. However, in calls to g
, the field selection expression bigSum dBigSum
just selects the monomorphic field from the dictionary. If you call it a dozen times at type A
, it just returns the same bigSumA :: A
thunk.
So, you are guaranteed that bigSum
will be evaluated once per instance for this example.
However, this guarantee relies on two things. First, it relies on the method being monomorphic after the instance parameters are resolved. If you had:
class BigPair a where
bigPair :: (Num b) => (a,b)
default bigPair :: (Enum a, Num a, Num b) => (a, b)
bigPair = (doBigSum, 123456789)
instance BigPair A where
bigPair = (doBigSum, 123456789)
instance BigPair B
g :: (Num a, BigPair a, Num b) => a -> (a, b)
g x = let (a, b) = bigPair in (a x, b)
you might find that an expression like g 1 :: (A, Int)
causes re-evaluation of doBigSum
every time it's run. Same with g 1 :: (B, Int)
. The issue is that the field in the instance dictionary for A
is of type forall b. Num b => (A, b)
, so it's not monomorphic. The compiler might lift the evaluation of doBigSum
out of the field expression, but you're relying on an optimization now. In my testing, running under GHCi or compiled with -O0
, doBigSum
was re-evaluated every time. Compiled with -O2
, it was only evaluated once.
Second, the it relies on the instances themselves being monomorphic. You can easily write a polymorphic instance that can result in bigSum
being re-evaluated every time it's called. Consider:
instance (Enum a, Num a) => BigSum [a] where
bigSum = [doBigSum]
This desugars into a function that generates a family of instance dictionaries:
fBigSumList :: Enum a -> Num a -> BigSum [a]
fBigSumList dEnum dNum = BigSum { bigSum = bigSumList dEnum dNum }
bigSumList :: Enum a -> Num a -> a
bigSumList dEnum dNum = [doBigSum dEnum dNum]
Now, there is no longer any named, monomorphic field value for any particular type a
. The expression:
bigSum :: [Int]
desugars into the call:
bigSum (fBigSumList fEnumInt fNumInt)
= bigSumList fEnumInt fNumInt
= [doBigSum dEnum dNum]
which can cause re-evaluation of doBigSum
.
In my testing of a simple example, GHCi re-evaluated doBigSum
every time bigSum :: [Int]
was evaluated. However, compiling even with -O0
only evaluated it once, but probably because of common subexpression elimination in the main
function, so this is a potentially very fragile optimization.
In your Conversions
example, the field Conversions a
should be monomorphic once the type variable a
is resolved, so I believe that any monomorphic instances you define should share a single evaluated copy of conversions
. (One caveat: since the fields are functions, you are relying on the "work" that f
is doing being appropriately lifted out of the collection of functions that f
is returning.) If you change the definition of the Conversions
data type so it contains some higher-rank polymorphism, or if you define polymorphic SimpleConversion
instances, you may lose this guarantee.
You can examine the precise desugaring by compiling with something like:
ghc -O0 -ddump-ds -dsuppress-all -dno-suppress-type-applications
-dno-suppress-type-signatures -dsuppress-uniques -fforce-recomp
...
Be warned that classes with a single method are desugared into newtypes, so the instance dictionaries end up being coercions of the single field, rather than data
types with multiple, named fields. For example, your code actually desugars into something like this:
-- the polymorphic function `doBigSum`
doBigSum :: forall a. (Enum a, Num a) => a
doBigSum = \ @a $dEnum $dNum -> ...
-- the polymorphic default method
$dmbigSum :: forall a. (BigSum a, Enum a, Num a) => a
$dmbigSum = \ @a _ $dEnum $dNum -> doBigSum @a $dEnum $dNum
-- the instance dictionary for A
-- (note: because it's only got one field, it's a cast *of* that field)
$fBigSumA :: BigSum A
$fBigSumA = $cbigSum_A `cast` <Co:3>
-- bigSum as defined in the A instance
$cbigSum_A :: A
$cbigSum_A = doBigSum @A $fEnumA $fNumA
-- the instance dictionary for B
$fBigSumB :: BigSum B
$fBigSumB = $cbigSum_B `cast` <Co:3>
-- bigSum from the default method
$cbigSum_B :: B
$cbigSum_B = $dmbigSum @B $fBigSumB $fENumB $fNumB