Why my parallel code is even slower than without parallism-CodePudding

I followed Simon Marlow's book on parallel haskell(Chapter 1) using rpar/rseq .

Below is the code(Solving the Squid Game bridge simulation):

{-# LANGUAGE FlexibleContexts #-}

import Control.DeepSeq (force)
import Control.Exception (evaluate)
import Control.Parallel.Strategies
import Data.Array.IO
  ( IOUArray,
    getAssocs,
    newListArray,
    readArray,
    writeArray,
  )
import Data.Functor ((<&>))
import System.Environment (getArgs)
import System.Random (randomRIO)

game ::
  Int -> -- number of steps
  Int -> -- number of glass at each step
  Int -> -- number of players
  IO Int -- return the number of survivors
game totalStep totalGlass = go 1 totalGlass
  where
    go currentStep currentGlass numSurvivors
      | numSurvivors == 0 || currentStep > totalStep = return numSurvivors
      | otherwise = do
        r <- randomRIO (1, currentGlass)
        if r == 1
          then go (currentStep   1) totalGlass numSurvivors
          else go currentStep (currentGlass - 1) (numSurvivors - 1)

simulate :: Int -> IO Int -> IO [(Int, Int)]
simulate n game =
  (newListArray (0, 16) (replicate 17 0) :: IO (IOUArray Int Int))
    >>= go 1
    >>= getAssocs
  where
    go i marr
      | i <= n = do
        r <- game
        readArray marr r >>= writeArray marr r . (  1)
        go (i   1) marr
      | otherwise = return marr

main1 :: IO ()
main1 = do
  [n, steps, glassNum, playNum] <- getArgs <&> Prelude.map read
  res <- simulate n (game steps glassNum playNum)
  mapM_ print res

main2 :: IO ()
main2 = do
  putStrLn "Running main2"
  [n, steps, glassNum, playNum] <- getArgs <&> Prelude.map read
  res <- runEval $ do
    r1 <- rpar $ simulate (div n 2) (game steps glassNum playNum) >>= evaluate . force
    r2 <- rpar $ simulate (div n 2) (game steps glassNum playNum) >>= evaluate . force
    rseq r1
    rseq r2
    return $
      (\l1 l2 -> zipWith (\e1 e2 -> (fst e1, snd e1   snd e2)) l1 l2)
        <$> r1
        <*> r2

  mapM_ print res

main = main2

For main2, I've compiled using:

ghc -O2 -threaded ./squid.hs

and run as:

./squid 10000000 18 2 16  RTS -N2

I can't understand why main1 is faster than main2 while main2 has parallism in it.

Could anyone give me some comments on my code if this is the correct use of parallism?

CodePudding user response：

You aren't actually using any parallelism. You write

    r1 <- rpar $ simulate (div n 2) (game steps glassNum playNum) >>= evaluate . force

This sparks a thread to evaluate an IO action, not to run it. That's not useful.

Since your simulate is essentially pure, you should convert it from IO to ST s by swapping in the appropriate array types, etc. Then you can rpar (runST $ simulate ...) and actually do work in parallel. I don't think the force invocations are useful/appropriate in context; they'll free the arrays sooner, but at significant cost.