Prepend (:) function definition in Haskell-CodePudding

I would like to better understand some code I am looking at in a course I am taking. I used Ghci to deduce the type of (:) to be (:) :: a -> [a] -> [a]. I would like to know exactly how that function is implemented, but I cannot find anything.

CodePudding user response：

The first level of answer is that it's just the data constructor for lists. If the syntax were valid, you could imagine:

data [a] = [] | a : [a]

Compare the definition for, say, Complex:

data Complex a = a :  a

Or Two:

data Two a = Two a a

You could define your own list-alikes if you wanted; say

data List a = Nil | Cons a (List a)

and then (:) and Cons would have essentially the same implementation.

But then the obvious follow-up question is: how are data constructors implemented? Answering that question takes us out of the realm of Haskell-the-language and into the realm of a specific Haskell implementation. A mental model that has served me well for GHC's implementation, despite being a little wrong, is this: when you apply a data constructor to enough arguments, you get a new pointer to a chunk of memory that has a number (telling which of the many constructors it is) followed by some more pointers (one for each field of the constructor). So, for lists, you would have something like this:

term   | memory
------- -----------------------------------------------------
       | ------------
[]     | | 00000000 |
       | ------------
       |     tag
       |
       | ----------------------------------------------------
a : as | | 00000001 | 00101110100111000 | 00001010111011000 |
       | ----------------------------------------------------
       |     tag      pointer to a        pointer to as

CodePudding user response：

: is one of the data constructors for the list type, so it isn't a function with ordinary Haskell code as implementation like . The special syntax for lists gets in the way of explanation a bit, so I'll start with a different type and come back to real lists. Let's look at this one:

data List a
  = Nil
  | Cons a (List a)

This defines two constuctors: Nil and Cons. Nil doesn't have any parameters, so it just is a value of type List a (for any a). Cons has two parameters; it takes an a and a List a to result in a value of type List a.

That means we can use Cons exactly as an ordinary function. We pass it arguments and get a return value. Which invites us to wonder what the implementation of that function is. Ordinary functions are defined by code; When you apply a function you run its code¹, and whatever that code produces is the return value of the function. What code is run when you apply a data constructor?

The answer is actually none. Data constructors are special, even though we can apply them just like we can apply functions. Data constructors are how we introduce new kinds of value into our program, so they can't be defined using code (which would have to result in some combination of existing things, not new kinds of value).

Data constructors are also the fundamental building block of pattern matching. You can inspect a value to see whether it is the Cons constructor applied to two other values, and if so get access to them. When you have a value produced by code rather than by data constructor, there is no way of telling if it is the produce of a particular function (and even more impossible would be to "un-apply" the code and figure out what the arguments were). But data constructors can be un-applied this way, and that's exactly what makes pattern matching work.

The way Haskell provides these two features of data constructors (making new kinds of values, and being able to un-apply them to get the argument values), is not to implement them by running code. Instead when we say Cons True Nil, the arguments are simply stored as-is. Cons True Nil doesn't run any code to produce something; a Cons is just a value with two slots in it, and a Cons True Nil is just a Cons with those slots filled with a True and a Nil. This guarantees that this is really a new kind of value that couldn't have been made by any existing type (since they all have their own data constructors, not the new Cons). And it makes it easy to tell what the arguments to Cons were: Haskell can just look in the two slots.²

I promised I would bring this back to real Haskell lists, so lets do that. The type [a] and its constructors are built-in to Haskell, but they behave exactly the same as the home-made list type I defined above! It's sometimes a little hard to see how "normal" they are because of the unusual syntax used, but have a look at these:

-- definition 1
data List a
  = Nil
  | Cons a (List a)

-- definition 2
data List a
  = Nil
  | a `Cons` List a

-- definition 3
data [] a
  = []
  | (:) a ([] a)

-- definition 4
data [a]
  = []
  | a : [a]

All of these define a type with the same structure (which is the same structure of the builtin list type).

Definitions 1 and 2 in fact define exactly the same type; you can use backticks (`) around a name with two parameters to apply it infix between its arguments instead of before. This works with constructor names, and you can do it in the definition of the constructors are well.

Definitions 3 and 4 are the similar pattern but using the actual "names" used by the built in list type. You cannot actually enter either of these in your Haskell code only because it's not valid to use the special list syntax in your own definitions, but the built in list type behaves as-if it was defined by one of these. In definition 3 I've applied [] type constructor³ using ordinary prefix application ([] a just like I used List a in my other two definitions), and similarly applied the : constructor using prefix notation (surrounding the operator : with parentheses, as we can do with any operator like ( ) 1 2). While definition 4 mirrors the pattern of definition 2, using an infix constructor :, and also fully uses the unique [a] syntax for the list type.

So you see, : actually has no implementation (at the Haskell level). True : [] just is the : constructor applied to the arguments True and []; it does not do any work to evaluate to anything else. It is also handy as a "prepend operator", but the way the list type is defined makes : special. For example, the operator (for appending) needs to be defined in terms of prepending because we defined the list structure in terms of prepending (i.e. :). Prepending does not need to be defined in terms of anything else, because it is what we define lists in terms of. That's why you can't find any definition for : (well, that and the fact that the standard list type is built in to the compiler, so you can't even find a data declaration for it).

¹ With the parameter variables bound to the argument values you supplied.

²These abilities to "just store values" and "look in the slots" is something that is just a built-in fundamental concept of Haskell. They are the building blocks in terms of which the language Haskell is defined, rather than being something that is coded in ordinary Haskell. But they do of course have an implementation in lower level code that the compiler supplies; Daniel Wagner's answer talks a little bit about that, if you are interested.

³ This is one of the things that makes it hard to see how "normal" the list type is, having special rules for how you can write lists but none at all for how they behave. On the value level, the [] symbol is a data constructor with zero arguments, and surrounding something with square brackets like [True] is special syntactic sugar constructing a list (involving both constructors, in our case: True : []).

While on the type level, the same symbol [] is a type constructor with one argument, and wrapping something in square brackets like [Bool] is special syntactic sugar for applying the [] type constructor to it (in our case: [] Bool).

If I were redefining the Haskell language from scratch with the benefit of hindsight, I would name the list type constructor List, keeping the square bracket syntax solely for defining list values at the term level. I might even go so far as to use a "normal" name for the empty list constructor rather than [], leaving square brackets purely involved in the special syntactic sugar and not at all involved in the actual definition of the underlying type, which would then be perfectly ordinary in spelling as well as behaviour. The syntax for list types is cute, and very slightly shorter, but I think it ultimately hinders understanding rather than promoting it. (For beginners; once you're sufficiently used to it either way works fine)

Although [] is quite satisfying as a symbol for the empty list constructor, since the square bracket syntactic sugar would still define [] as a way of writing an empty list regardless, so maybe I'd be tempted to keep that.