I've been slowly making my way through Real World Haskell. In Chapter 24, the authors detail a program for reading a file in chunks and then processing it MapReduce-style. However, it fails with hGetBufSome: illegal operation (handle is closed)
. The program, pared down to an MRE and updated to modern Haskell, is as follows:
import Control.Exception (finally)
import Control.Parallel.Strategies (NFData, rdeepseq)
import qualified Data.ByteString.Lazy.Char8 as LB
import GHC.Conc (pseq)
import System.Environment (getArgs)
import System.IO
main :: IO ()
main = do
args <- getArgs
res <- chunkedReadWith id (head args)
print res
chunkedReadWith ::
(NFData a) =>
(LB.ByteString -> a) ->
FilePath ->
IO a
chunkedReadWith process path = do
(chunk, handle) <- chunkedRead path
let r = process chunk
-- the RHS of finally is for some reason being run before the handle is
-- finished being used. removing it allows the program to run, with the obvious
-- disadvantage of leaving closing the handle to the garbage collector
(rdeepseq r `seq` return r) `finally` hClose handle
chunkedRead ::
FilePath ->
IO (LB.ByteString, Handle)
chunkedRead path = do
h <- openFile path ReadMode
chunk <- LB.take 64 <$> LB.hGetContents h
rdeepseq chunk `pseq` return (chunk, h)
I suspect this is a problem with inadequately forcing strict evaluation, but my current understanding of seq
/pseq
and Strategies
tells me that the program as written should work, because reduction to normal form should mean that the handle
has already been read from by the time hClose
is evaluated. What have I missed?
On a small side note, it's unclear why the authors chose to use seq
in one place and pseq
in the other, but since my example has removed any parallel operation, it shouldn't (and indeed doesn't) make a difference.
CodePudding user response:
Quoting from this comment on the bug I filed,
The
NFData
instance forLazyByteString
is correct, albeit perhaps written obtusely. Note that theChunk
constructor'sS.ByteString
field is a strict field, and theNFData
instance forStrictByteString
evaluates only to WHNF.The problem is elsewhere: It's that
rdeepseq chunk
is anEval LazyByteString
object that can reach WHNF (as witnessed byseq
orpseq
) beforechunk
has actually beendeepseq
'ed. TrywithStrategy rdeepseq chunk
instead.
In other words, it seems merely applying rdeepseq
is not enough. Instead, we must use withStrategy
(or using
, alternately) to actually apply the strategy. It seems likely that rnf
from the 1.x API had slightly different behavior. There is an rnf
in Control.DeepSeq
that seems to behave similarly.
Concretely, replacing the offending line with the following fixes the problem:
(withStrategy rdeepseq r `seq` return r) `finally` mapM_ hClose handles
Alternately using deepseq
, we could more concisely say
(rnf r `seq` return r) `finally` mapM_ hClose handles
or even
(r `deepseq` return r) `finally` mapM_ hClose handles