Inlining of function parameters in GHC-CodePudding

I was trying to find the source for a certain kind of inlining that happens in GHC, where a function that is passed as an argument to another function is inlined. For example, I may write a definition like the following (using my own List type to avoid rewrite rules):

data MyList a = Nil | Cons a (MyList a)
  deriving (Show)

mapMyList :: (a -> b) -> MyList a -> MyList b
mapMyList f Nil = Nil
mapMyList f (Cons a rest) = Cons (f a) $ mapMyList f rest

followed by a call

fromList :: [a] -> MyList a
fromList = ...

main = do
  print $ mapMyList (*2) $ fromList [1..5]

mapMyList is recursive, so it is not directly inlinable. However, in the generated Core, I see the following definition:

Rec {
-- RHS size: {terms: 16, types: 11, coercions: 0, joins: 0/0}
Main.main_$smapMyList [Occ=LoopBreaker] :: MyList Int -> MyList Int
[GblId, Arity=1, Str=<S,1*U>, Unf=OtherCon []]
Main.main_$smapMyList
  = \ (sc_s2Rb :: MyList Int) ->
      case sc_s2Rb of {
        Nil -> Main.Nil @Int;
        Cons a_aBe rest_aBf ->
          Main.Cons
            @Int
            (case a_aBe of { GHC.Types.I# x_a24u ->
             GHC.Types.I# (GHC.Prim.*# x_a24u 2#)
             })
            (Main.main_$smapMyList rest_aBf)
      }
end Rec }

In particular, smapMyList no longer takes a function as a parameter, and the (* 2) is inlined here. I wanted to know which optimization pass is generating this code, and how it works?

I found this related issue, which seems to be asking for a way to guarantee this behavior using a SPECIALIZE pragma, which led me to believe that the specialization pass is doing this. However, when reading the documentation on specialization in GHC, it seems to be for specializing typeclass dictionaries, not function arguments (there are no typeclasses in my example).

I also looked at static argument transformation which seems related; for example the GHC source says this about the pass:

May be seen as removing invariants from loops: Arguments of recursive functions that do not change in recursive calls are removed from the recursion, which is done locally and only passes the arguments which effectively change.

However, I tried disabling both of these passes with -fno-static-argument-transformation -fno-specialise and found that this transformation still happens anyway.

My motivation for asking this is that I'm implementing this transformation in another language (Koka) so I'd like to understand how other languages do it.

Complete test program

Generated core (after disabling specialisation and static argument transformation)

CodePudding user response：

The optimization is called "call-pattern specialization" (a.k.a. SpecConstr) this specializes functions according to which arguments they are applied to. The optimization is described in the paper "Call-pattern specialisation for Haskell programs" by Simon Peyton Jones. Since that paper there seem to have been two important changes:

SpecConstr can apply to any call in the same module, not just recursive calls inside a single definition.
SpecConstr can apply to functions as arguments, not just constructors. However, it doesn't work for lambdas, unless they have been floated out by full laziness.

Here is the relevant part of the core that is produced without this optimization, using -fno-spec-constr, and with the -dsuppress-all -dsuppress-uniques -dno-typeable-binds flags:

Rec {
mapMyList
  = \ @ a @ b f ds ->
      case ds of {
        Nil -> Nil;
        Cons a1 rest -> Cons (f a1) (mapMyList f rest)
      }
end Rec }

Rec {
main_go
  = \ x ->
      Cons
        (I# x)
        (case x of wild {
           __DEFAULT -> main_go ( # wild 1#);
           5# -> Nil
         })
end Rec }

main3 = \ ds -> case ds of { I# x -> I# (*# x 2#) }

main2
  = $fShowMyList_$cshowsPrec
      $fShowInt $fShowMyList1 (mapMyList main3 (main_go 1#))

main1 = main2 []

main = hPutStr' stdout main1 True

I think this optimization is a bit disappointing because it only applies to uses within the same module. Also, for a long time (since the "Stream Fusion. From Lists to Streams to Nothing at All" paper from 2007) people have hoped that this optimization could optimize stream fusion. However, as I understand it, nobody has been able to make it work properly and reliably for that purpose (see this GHC issue). So now we still use the inferior foldr/build fusion in base.

I started a discussion about possibilities of improving the status quo in this GHC issue. I think the most promising line of future work would be to improve the static argument transform optimization. See this comment by Sebastian Graf.

I have done some more debugging, specifically I have used the -dverbose-core2core option and inspected the results. As I expected the optimization happens because of the SpecConstr optimization. Here is the core before SpecConstr (after a Simplifier pass):

Rec {
mapMyList
  = \ @ a @ b f ds ->
      case ds of {
        Nil -> Nil;
        Cons a rest -> Cons (f a) (mapMyList f rest)
      }
end Rec }

lvl = \ ds -> case ds of { I# x -> I# (*# x 2#) }

main
  = case toList (mapMyList lvl (fromList (eftInt 1# 5#))) of {
      ...

And here is the core after SpecConstr:

Rec {
$smapMyList
  = \ sc ->
      case sc of {
        Nil -> Nil;
        Cons a rest -> Cons (lvl a) (mapMyList lvl rest)
      }

mapMyList
  = \ @ a @ b f ds ->
      case ds of {
        Nil -> Nil;
        Cons a rest -> Cons (f a) (mapMyList f rest)
      }
end Rec }

lvl = \ ds -> case ds of { I# x -> I# (*# x 2#) }

main
  = case toList (mapMyList lvl (fromList (eftInt 1# 5#))) of {
      ...

So, you can see that SpecConstr creates that $smapMyList function which is a version of mapMyList specialized to the lvl argument, which is the *2 function.

Note that this specialized function is not used yet, that is done using a newly created rewrite rule which fires afterwards when the Simplifier runs. If you use the flag -ddump-rule-rewrites you can see them in action:

Rule fired
    Rule: SC:mapMyList0
    Module: (Main)
    Before: mapMyList TyArg Int TyArg Int ValArg lvl ValArg rest
    After:  (\ sc -> $smapMyList sc) rest
    Cont:   Stop[BoringCtxt] MyList Int
Rule fired
    Rule: SC:mapMyList0
    Module: (Main)
    Before: mapMyList
              TyArg Int TyArg Int ValArg lvl ValArg fromList (eftInt 1# 5#)
    After:  (\ sc -> $smapMyList sc) (fromList (eftInt 1# 5#))
    Cont:   StrictArg toList
            Select nodup wild
            Stop[RhsCtxt] String

This is the core after that subsequent Simplifier pass (it has also inlined lvl):

Rec {
$smapMyList
  = \ sc ->
      case sc of {
        Nil -> Nil;
        Cons a rest ->
          Cons (case a of { I# x -> I# (*# x 2#) }) ($smapMyList rest)
      }
end Rec }

main
  = case toList ($smapMyList (fromList (eftInt 1# 5#))) of {
      ...

This also suggests a reason why full laziness is also necessary for this optimization: the lvl function needs to be floated out because SpecConstr doesn't work on lambdas. This floating out requires full laziness.