Below is the code. I understand applicative, functor, traversable and monad to a certain extent. The iterator and yield types, and yield functions are what I struggle understanding the most. For example, what are i o r in Iterator and i o r a in Yield? what does traverseY do exactly and what do the signatures mean? and How Monad Cont is applied here also confuses me a lot. Thank you for reading, any input would be appreciated.
import Control.Monad.Cont
import Control.Monad.Random
import System.Random.Shuffle
data Iterator i o r =
Result r
| Susp o (i -> Iterator i o r)
--------------------------------------------------------------------------------------------
newtype Yield i o r a = Yield { unY :: Cont (Iterator i o r) a }
instance Functor (Yield i o r) where
fmap f (Yield m) = Yield (fmap f m)
instance Applicative (Yield i o r) where
pure a = Yield (pure a)
(Yield mf) <*> (Yield ma) = Yield (mf <*> ma)
instance Monad (Yield i o r) where
return a = Yield (return a)
(Yield m) >>= k = Yield (m >>= \a -> unY (k a))
instance MonadCont (Yield i o r) where
callCC c = Yield (callCC (\k -> unY (c (\a -> Yield (k a)))))
runYield :: Yield i o r r -> Iterator i o r
runYield (Yield m) = runCont m Result
yield :: o -> Yield i o r i
yield o = callCC (\k -> Yield (cont (\_ -> Susp o (\i -> runYield (k i)))))
-------------------------------------------------------------------------------------------
data Tree a = Empty | Node a (Tree a) (Tree a)
traverseY :: Tree a -> Yield (Tree a) (a,Tree a,Tree a) r ()
traverseY Empty = return ()
traverseY (Node a t1 t2) = do t <- yield (a, t1, t2); traverseY t
My understanding here is that o seems like a temporary value that is similar to the temporary value s one would use in loops in languages like python or java to improve execution time? and loop kind of iterate over what (traverse yield tr) yields? still I am very confused.
dfsDirect :: Tree a -> [a]
dfsDirect tr = loop (runYield (traverse yield tr))
where loop (Result r) = foldr ( ) [] r
loop (Susp o i) = loop (i ((\a->[a]) o))
CodePudding user response:
An Iterator i o r
represents a process that repeatedly takes a value of type i
and outputs a value of type o
, eventually breaking the iteration after one of the i
s by returning a r
instead. E.g.
accum :: Iterator Int Int Int
accum = go 0
where go n | n >= 100 = Result n
| otherwise = Susp n (\i -> go (n i))
-- potential usage
main :: IO ()
main = go accum -- iterator returns modified copies instead of mutating, so prepare to replace the iterator through iteration/recursion
where go (Result r) = putStrLn $ "Final total: " show r -- check whether iterator has values/extract values by pattern matching; a finished iterator can return extra data of type r if it likes
go (Susp o i) = do
putStrLn $ "Running total: " show o
putStr $ "Add: "
n <- readLn
go (i n) -- bidirectional communication! get "incremented" iterator by feeding in an input value (you could write no-input iterators by setting i = ())
is an iterator that keeps track of some accumulator n
. It adds each input to the accumulator and each time outputs the new accumulator. Once the accumulator reaches 100, it stops accepting input and returns the final value. Note that, since everything is immutable, in order to keep state, the Iterator
has to return a new version of itself every time it changes state. Whoever is iterating "through" accum
in turn has to use the returned Iterator
instead of accum
itself. In Python, you could write accum
as:
def accum(): # calling accum() instead creates a new object that mutates itself through iteration
sum = 0
while sum < 100: sum = sum (yield sum)
return sum
# same usage
def main():
gen = accum()
o = next(gen)
try: # I was hoping the Python version would help understanding... but I think I overestimated how pythonic Python would be...
while True:
print(f"Running total: {o}")
o = gen.send(int(input("Add: ")))
except StopIteration as e:
print(f"Final total: {e.value}")
Yield i o r r
is being used as a "builder" for Iterator i o r
. You can write an analogue of Yield
's interface for Iterator
directly:
instance Functor (Iterator i o) where
fmap f (Result x) = Result (f x)
fmap f (Susp o i) = Susp o (fmap f . i)
instance Applicative (Iterator i o) where
pure = Result
liftA2 = liftM2
instance Monad (Iterator i o) where
Result r >>= f = f r
Susp o i >>= f = Susp o ((>>= f) . i)
-- yieldI x is the iterator that sends x to the caller and receives a value in return
-- in Python: def yieldI(x): return yield x
yieldI :: o -> Iterator i o i
yieldI x = Susp x Result
E.g. the generator in your DFS example is
data Tree a = Empty | Node a (Tree a) (Tree a)
dfsI :: Tree a -> Iterator b a (Tree b) -- yield elements of the tree (o = a) *and also* receive new elements to replace them (i = b), building a new tree (r = Tree b)
-- dfs = traverse yieldI
dfsI Empty = Result Empty
dfsI (Node l m r) = do
l' <- yieldI l
m' <- dfsI m
r' <- dfsI r
return (Node l' m' r')
The issue with using Iterator i o r
directly here is that it is inefficient. (Remember that Haskell is lazily evaluated to understand the following.) If you "concatenate" many iterators, like ((x >>= f) >>= g) >>= h
, then you run into trouble when you try to evaluate it. Say x
evaluates to Susp o i
. Then evaluating the bigger expression first does three function calls, into the >>=
s, evaluates x
to Susp o i
, then creates new suspended function calls to produce Susp o (\a -> ((i a >>= f) >>= g) >>= h)
. When you iterate through this iterator (i.e. extract the lambda and call it with some argument), each iteration must walk through all the >>=
s that are hanging on to the Iterator
. Yikes. (Stated perhaps in more familiar terms, we've implemented iterator concatenation as a "wrapper" around another, which gets bad when you have wrappers on wrappers on wrappers...)
Using Cont
is a "standard trick" for avoiding this. The idea is that, instead of handling an iterator x
directly, we handle its bind function (wrapped in Cont
and Yield
newtypes) x' = \f -> x >>= f
. Note that converting a monadic computation to its bind function is reversible, since x' return = x >>= return = x
.
newtype Yield i o r a = Yield { unYield :: (a -> Iterator i o r) -> Iterator i o r }
instance Functor (Yield i o r) where
fmap f (Yield r) = Yield (\k -> r $ k . f)
instance Applicative (Yield i o r) where
pure x = Yield (\k -> k x)
liftA2 = liftM2
instance Monad (Yield i o r) where
Yield r >>= f = Yield (\k -> r $ \x -> unYield (f x) k)
-- newtype Yield i o r a = Yield { unYield :: (a -> Iterator i o r) -> Iterator i o r }
-- compare (>>=) :: Iterator i o a -> (a -> Iterator i o r) -> Iterator i o r
-- giving
liftIY :: Iterator i o a -> Yield i o r a
liftIY x = Yield (x >>=)
-- and the law x >>= return = x inspires
runYield :: Yield i o r r -> Iterator i o r
runYield (Yield r) = r return
-- e.g.
yield :: o -> Yield i o r i
-- yield = liftIY . yieldI
yield x = Yield (\k -> Susp x (\i -> k i)) -- note this is the same as yours after you drill through newtype baggage
Instead of having ((x >>= f) >>= g) >>= h
, using Yield
will make terms like (((x' >>= liftIY . f) >>= liftIY . g) >>= liftIY . h) return
appear at runtime. This evaluates to just x' >>= _someFunction
, so all the wrappers from before have collapsed into just one, hopefully leading to efficiency. (This collapsed wrapper will "replace itself" as you iterate through, going through the behaviors specified by f
, g
, and h
in turn. This is encoded in Yield
's >>=
.)
-- magical free efficiency by replacing Iterator -> Yield, yieldI -> yield
dfs :: Tree a -> Yield b a r (Tree b)
dfs = traverse yield
-- when using Yields as builders, you will treat Yield i o _ r like Iterator i o r
-- the middle parameter should be polymorphic for builders like dfs
-- while the final consumer (particularly the "standard consumer" runYield) fixes it to something (runYield sets it to the final return type)
-- (this behavior is a loosely typed reflection of the Codensity monad)
At the final use site, Yield
s have to be made into Iterator
s before they can be iterated. Your dfsDirect
:
- Uses
dfs = traverse yieldI
to build aYield
without incurring the wrath of "accidentally quadratic left associativity" (technical term :)) - Builds an
Iterator
out of thatYield
usingrunYield
. This iterator goes throughtr
and yields/replaces its elements. - Iterates that iterator via
loop
, which... - "Converses" with
dfs
via the lineloop (Susp o i) = loop (i [o])
: when it receiveso
it sends[o]
back into the iterator, which puts it in theTree
it's building. - Upon exhausting the iterator, receives a new
Tree
where every element is replaced with a singleton list (loop (Result r) = _
). - Concatenates all the lists together via the
Foldable Tree
instance.
This is a pretty dumb way to do things, since the order doesn't come from dfs
but from the Foldable Tree
instance used in the last step. The Iterator
is just used as a glorified fmap
function. If the Foldable
instance were different (e.g. BFS, or even just inorder vs preorder), but you kept dfs
as a preorder DFS (so it would no longer be written traverse yield
), dfsDirect
would not output in the actual order defined by dfs
! You could write a function that properly turns an Iterator
into a list.
-- note the usage of () as an input type for "plain" iterators
-- since we cannot know what to pass in otherwise
iToList :: Iterator () o r -> ([o], r)
iToList (Result r) = ([], r)
iToList (Susp o i) = (o : os, r)
where ~(os, r) = iToList (i ())
traverseY
is also a bit strange. If it receives a Node
(either as initial value or as an iterator input), it yields back the fields of the Node
, and on Empty
else it returns. It doesn't actually "traverse" its input; you can send it off into a completely new tree just by sending that tree as input. I assume the idea is that when you iterate over it, you send back one of the Tree
s it previously returned, so it iterates over a path to a leaf. IMO it would be nicer to write
data Direction = L | R
path :: Tree a -> Yield Direction a r () -- outputs root and goes in direction as told until it runs out of tree
path Empty = pure ()
path (Node x l r) = do
d <- yield x
case d of
L -> path l
R -> path r
-- potential use
elemBST :: Ord a => a -> Tree a -> Bool
elemBST x xs = go (runYield $ path xs)
where go (Result ()) = False -- iteration ended without success
go (Susp y i) = case compare x y of
LT -> go (i L) -- go left
EQ -> True -- end
GT -> go (i R) -- go right