Home > Software engineering >  How is integer comparison implemented in GHC?
How is integer comparison implemented in GHC?

Time:11-09

At first, I wanted to look into how Integer was deriving from the classOrd

I got that definition in GHC.Classes

instance Ord Integer where
    (<=) = leInteger
    (>)  = gtInteger
    (<)  = ltInteger
    (>=) = geInteger
    compare = compareInteger

compareInteger is pointing to another source file, GHC.Integer.Type. it is defined like this :

compareInteger :: Integer -> Integer -> Ordering
compareInteger (Jn# x)  (Jn# y) = compareBigNat y x
compareInteger (S#  x)  (S#  y) = compareInt#   x y
compareInteger (Jp# x)  (Jp# y) = compareBigNat x y
compareInteger (Jn# _)  _       = LT
compareInteger (S#  _)  (Jp# _) = LT
compareInteger (S#  _)  (Jn# _) = GT
compareInteger (Jp# _)  _       = GT

S# is for fixed-size integers, Jn# and Jp# for arbitrary size integers.

In GHC.Classes (from the package ghc-prim) I was able to find a definition for compareInt#. The occurence of unusual types like Int# signaled me I was getting closer.

compareInt# :: Int# -> Int# -> Ordering
compareInt# x# y#
    | isTrue# (x# <#  y#) = LT
    | isTrue# (x# ==# y#) = EQ
    | True                = GT

Still going deeper, I got this definition for the operator (GHC.Prim module)

infix 4 <#
(<#) :: Int# -> Int# -> Int#
(<#) = (<#)

But this is a deep I was able to get. <# is refering to itself. We don’t know what it’s doing.

(isTrue# is simply a function that “Returns True if its parameter is 1# and False if 0#”)

Where can I find the source, the actual place where the job is getting done ? Is there some assembly at the very bottom ? Where do I find this sacred place ?

CodePudding user response:

First of all, technically when you enter the GHC.Integer.Type module you leave the realm of Haskell and enter the realm of the current implementation that GHC uses, so this question is about GHC Haskell specifically.

All the primitive operations like (<#) are implemented as a recursive loop which you have found in the GHC.Prim module. From there the documentation tells us the next place to look is the primops.txt.pp file where it is listed under the name IntLtOp.

Then the documentation mentioned earlier says there are two groups of primops: in-line and out-of-line. In-line primops are resolved during the translation from STG to Cmm (which are two internal representations that GHC uses) and can be found in the GHC.StgToCmm.Prim module. And indeed the IntLtOp case is listed there and it is transformed in-line using mainly the mo_wordSLt function which depends on the platform.

This mo_wordSLt function is defined in the GHC.Cmm.MachOp module which contains to quote:

Machine-level primops; ones which we can reasonably delegate to the native code generators to handle.

The mo_wordSLt function produces the MO_S_Lt constructor of the MachOp data type. So we can look further into a native code generator to see how that is translated into low-level instructions. There is quite a bit of choice in platforms: SPARC, AArch64, LLVM, C, PPC, and X86 (I found all these with the search function on GitLab).

X86 is the most popular platform, so I will continue there. The implementation uses a condIntReg helper function, which is defined as follows:

condIntReg :: Cond -> CmmExpr -> CmmExpr -> NatM Register

condIntReg cond x y = do
  CondCode _ cond cond_code <- condIntCode cond x y
  tmp <- getNewRegNat II8
  let
        code dst = cond_code `appOL` toOL [
                    SETCC cond (OpReg tmp),
                    MOVZxL II8 (OpReg tmp) (OpReg dst)
                  ]
  return (Any II32 code)

Also interesting is the definition of condIntCode, which depends on the operands of the condition. It is a large function, so I will not reproduce the full code here, but the general case produces a CMP instruction:

-- anything vs anything
condIntCode' _ cond x y = do
  platform <- getPlatform
  (y_reg, y_code) <- getNonClobberedReg y
  (x_op, x_code) <- getRegOrMem x
  let
        code = y_code `appOL`
               x_code `snocOL`
                  CMP (cmmTypeFormat (cmmExprType platform x)) (OpReg y_reg) x_op
  return (CondCode False cond code)
  • Related