Let pack
be the function [a] -> [[a]]
which takes a list and groups consecutive repeated elements into sublists.
Here are two implementations of pack
in Haskell.
pack :: (Eq a) => [a] -> [[a]]
pack x = reverse $ foldl f [] x where
f cks@(ck1:_):rest) x
| x == ck1 = (x:ck):rest
| otherwise [x]:cks
f _ x = [[x]]
pack' (x:xs) = let (first,rest) = span (==x) xs
in (x:first) : pack' rest
pack' [] = []
These implementations have a critical semantic difference: the first implementation fails to terminate if we apply it to an infinite list, e.g. [1..]
. But the second implementation does work for infinite lists. For example, head $ pack' [1..]
evaluates.
My guess is the let
in
notation is lazy, hence span
(which uses let
-in
in its Prelude definition) only evaluates finitely many expressions when we apply pack'
on an infinite list.
However, this is an unsatisfactory explanation to me, because I can replace reverse
with the following definition.
reverse' = foldl (\y x0 -> x0:y) []
If we do this, every expression in pack
folds from left to right—so I would expect this to work for infinite lists—yet it still hangs.
The question: Why does pack'
work for infinite lists and not pack
?
CodePudding user response:
foldl :: Foldable f => (b -> a -> b) -> b -> f a -> b
will for a given function f
, and a base value z
for a list [x1, x2, …, xn]
produce the result of:
f (f (… (f (f z x1) x2) …) xn-1) xn
If we thus want to determine the weak head normal form (WHNF) we need to access the last element of the list. The fold function f
of the foldl
can be lazy in its first parameter, but we will at least have to make a function call with xn
as parameter. This is why the documentation on foldl
says:
Note that to produce the outermost application of the operator the entire input list must be traversed. Like all left-associative folds,
foldl
will diverge if given an infinite list.
My guess is the let in notation is lazy, hence span (which uses let-in in its Prelude definition) only evaluates finitely many expressions when we apply pack' on an infinite list.
You are correct, definitions in the let
, where
clauses and all other subexpressions are all lazy. But eventually if you are interested in the result, you need to determine the WHNF, and sometimes more than the WHNF.
The reason that it works is because span :: (a -> Bool) -> [a] -> ([a], [a])
is implemented lazily. Indeed, span
is implemented as [src]:
span :: (a -> Bool) -> [a] -> ([a],[a]) span _ xs@[] = (xs, xs) span p xs@(x:xs') | p x = let (ys,zs) = span p xs' in (x:ys,zs) | otherwise = ([],xs)
It thus does not need to know how the span
of the tail looks in order to generate a 2-tuple where x
that has satisfied the predicate is put in the first item, or in the second item if p x
failed.
This thus means that span
will generate a 2-tuple where the first item will contain all the elements that satisfy the predicate, whereas the second item is a lazy reference to the rest of the list to process.