Home > Software engineering >  Why is using rdeepseq not sufficient to finish reading from a handle before closing it?
Why is using rdeepseq not sufficient to finish reading from a handle before closing it?

Time:12-16

I've been slowly making my way through Real World Haskell. In Chapter 24, the authors detail a program for reading a file in chunks and then processing it MapReduce-style. However, it fails with hGetBufSome: illegal operation (handle is closed). The program, pared down to an MRE and updated to modern Haskell, is as follows:

import Control.Exception (finally)
import Control.Parallel.Strategies (NFData, rdeepseq)
import qualified Data.ByteString.Lazy.Char8 as LB
import GHC.Conc (pseq)
import System.Environment (getArgs)
import System.IO

main :: IO ()
main = do
  args <- getArgs
  res <- chunkedReadWith id (head args)
  print res

chunkedReadWith ::
  (NFData a) =>
  (LB.ByteString -> a) ->
  FilePath ->
  IO a
chunkedReadWith process path = do
  (chunk, handle) <- chunkedRead path
  let r = process chunk
  -- the RHS of finally is for some reason being run before the handle is
  -- finished being used. removing it allows the program to run, with the obvious
  -- disadvantage of leaving closing the handle to the garbage collector
  (rdeepseq r `seq` return r) `finally` hClose handle

chunkedRead ::
  FilePath ->
  IO (LB.ByteString, Handle)
chunkedRead path = do
  h <- openFile path ReadMode
  chunk <- LB.take 64 <$> LB.hGetContents h
  rdeepseq chunk `pseq` return (chunk, h)

I suspect this is a problem with inadequately forcing strict evaluation, but my current understanding of seq/pseq and Strategies tells me that the program as written should work, because reduction to normal form should mean that the handle has already been read from by the time hClose is evaluated. What have I missed?

On a small side note, it's unclear why the authors chose to use seq in one place and pseq in the other, but since my example has removed any parallel operation, it shouldn't (and indeed doesn't) make a difference.

CodePudding user response:

Quoting from this comment on the bug I filed,

The NFData instance for LazyByteString is correct, albeit perhaps written obtusely. Note that the Chunk constructor's S.ByteString field is a strict field, and the NFData instance for StrictByteString evaluates only to WHNF.

The problem is elsewhere: It's that rdeepseq chunk is an Eval LazyByteString object that can reach WHNF (as witnessed by seq or pseq) before chunk has actually been deepseq'ed. Try withStrategy rdeepseq chunk instead.

In other words, it seems merely applying rdeepseq is not enough. Instead, we must use withStrategy (or using, alternately) to actually apply the strategy. It seems likely that rnf from the 1.x API had slightly different behavior. There is an rnf in Control.DeepSeq that seems to behave similarly.

Concretely, replacing the offending line with the following fixes the problem:

  (withStrategy rdeepseq r `seq` return r) `finally` mapM_ hClose handles

Alternately using deepseq, we could more concisely say

(rnf r `seq` return r) `finally` mapM_ hClose handles

or even

(r `deepseq` return r) `finally` mapM_ hClose handles
  • Related