Let’s say I have several very large vectors. They are stored on disk. I need to access them individually by reading from each respective file which would place them into memory. I would perform some function on a single vector and then move to the next one I need access. I need to be able to instruct each vector in memory to be garbage collected every time I need to access a different vector. I’m not sure if performMajorGC
would ensure that the vector would be garbage collected if it is stated in my program that I have to access that same vector again later by referencing the same function name that read the vector in from disk. In such a case I would read it into memory again, use it, then garbage collect it. How would I ensure it’s garage collection while using the same function name for the vector that is read from the same file?
Would appreciate any advice thanks
In response to Daniel Wagner:
myvec x :: Int -> IO (Vector (Vector ByteString))
myvec x = do let ioy = do y <- Data.ByteString.Lazy.readFile ("data.csv" (show x))
guard (isRight (Data.Csv.decode NoHeader y))
return y
yy <- ioy
return (head $ snd $ partitionEithers [Data.Csv.decode NoHeader yy])
myvecvec :: Vector (IO (Vector (Vector ByteString)))
myvecvec = generate 100 (\x -> myvec x)
somefunc1 :: IO (Vector (Vector ByteString)) -> IO ()
somefunc1 iovv = do vv <- iovv
somefunc1x1 vv :: Vector (Vector ByteString) -> IO ()
-- same thing for somefunc2 and 3
oponvec :: IO ()
oponvec = do somefunc1 (myvecvec ! 0)
performGC
somefunc2 (myvecvec ! 1)
performGC
somefunc3 (myvecvec ! 0)
CodePudding user response:
You can test this by using a weak pointer as follows:
import qualified Data.Vector.Unboxed as V
import System.Mem.Weak
import System.Mem
main :: IO ()
main = do
let xs = V.fromList [1..1000000:: Int]
wkp <- mkWeakPtr xs Nothing
performGC
xs' <- deRefWeak wkp
print xs'
On my system this prints Nothing
which means that the vector has been deallocated. However, I don't know if GHC guarantees that this happens.
Here's a program which checks @amalloy's suggestion:
import qualified Data.Vector.Unboxed as V
import Control.Monad
import Data.Word
{-# NOINLINE newLarge #-}
newLarge :: Word8 -> V.Vector Word8
newLarge n = V.replicate 5000000000 n -- 5GB
main :: IO ()
main = forM_ [1..10] $ \i -> print (V.sum (newLarge i))
This uses exactly 5GB on my machine, which shows that there are never two large vectors allocated at the same time.
CodePudding user response:
I need to be able to instruct each vector in memory to be garbage collected every time I need to access a different vector.
Do you? Why? If it's simply because they're large and you're worried about fitting the vector in memory, then don't worry about it. If memory space is needed, and the object is unreachable, then garbage collection will pick it up. If memory space is not needed, you don't need to do anything. And if the object is reachable, running the GC won't help. So there are no cases where manual intervention in GC will do any good.
And if you want to GC it for some other reason than freeing up memory, you need to explain that in the question, because that goal will surely affect answers.