Haskell MVar & Text laziness comparison-CodePudding

I benchmarked the following code example with different variations of normal/lazy text and normal/strict MVar:

import           Control.Concurrent.MVar
import qualified Data.Text                     as T

main :: IO ()
main = do
    mvar <- newMVar T.empty

    let textArr    = map (const $ T.pack "01234567890123456789") [0 .. 15000 :: Int]
        mvarWriter = \newText -> modifyMVar_ mvar (\oldText -> return $ oldText <> newText)

    mapM_ mvarWriter textArr
    print . T.length =<< readMVar mvar

| Version                 | Execution time in seconds |
| ------------------      | ------------------------- |
| Text   Strict MVar      | 0.26                      |
| Text   MVar             | 8.35                      |
| Lazy Text   Strict MVar | 17                        |
| Lazy Text   MVar        | 17                        |

After reading some articles about this, I would have thought that lazy text strict MVar would be the fastest but to my surprise it is not.

Can anyone explain what is going on? Why is the strict MVar normal text so much faster than normal Text normal MVar? Why is lazy text so slow no matter the strictness of the MVar?

CodePudding user response：

Lazy vs Strict Text

First of all, lazy text is like a linked list of strict texts. The <> function traverses the whole list and adds its right argument to the end of the list. That means the lazy text version ends up with a linked list with 15000 elements. And every time an element is added the program traverses this whole list until it reaches the end and can append the element.

The strict <> is just copying two regions of memory to a new region. That is a cheaper operation, because this can make use of SIMD operations to copy up to 64 characters at a time (which is more than a whole chunk of lazy text). Also, this is much better for cache locality compared to the linked list pointers which could be anywhere in memory.

Then finally there is a lot of memory overhead in the lazy text, because it has to store a header for the chunk (equivalent to 8 chars) pointers to the next chunk (8 chars) and Text itself contains the length (8 chars) and and offset (8 chars) for slicing, and finally the underlying ByteArray# has another length (8 chars). So the lazy version will store the equivalent of 40 extra characters per chunk of 20 characters.

I should also note that lazy text is a differs from a list of strict text in one important way: the strict text is unpacked into the lazy text chunks. That unpacking saves a level of indirection, but it also prevents sharing between chunks. In this case every chunk contains exactly the same text, so it could all be shared. I will come back to this in the next part.

Lazy vs Strict MVar

It's not really specifically about the strictness of the MVar here, that's just convenience. You'd probably get the same results if you use $! here:

mvarWriter = \newText -> modifyMVar_ mvar (\oldText -> return $! oldText <> newText)

(Or $!! if you use lazy text)

The difference between the lazy version and the strict version is that the lazy version doesn't actually compute oldText <> newText before putting it into the MVar. The lazy version postpones that computation until it encounters that print . T.length =<< readMVar mvar line.

How does (GHC) Haskell store the computation so that it is able to run it at a later point in time? As a closure on the heap. The closure stores a pointer to all the arguments that originate from outside the function (the free variables). In this case that is just newText.

So actually, the strict Text lazy MVar version is very similar to the lazy Text version. Both construct a kind of linked list structure in the heap. This requires extra space, time for allocation, and adds indirections.

One difference compared to the lazy Text is that the strict Text lazy MVar version doesn't have to traverse the whole structure each time it adds a new text. Also, this implicit linked list through closures has the advantage that it can share pointers in the structure. So in the beginning there will just be a single "01234567890123456789" text and many pointers to it.