Home > Back-end >  How to take a lazy ByteString and write it to a file (in constant memory) using conduit
How to take a lazy ByteString and write it to a file (in constant memory) using conduit

Time:10-28

I am streaming the download of an S3 file using amazonka, and I use the sinkBody function to continue with the streaming. Currently, I download the file as follows:

getFile bucketName fileName = do
    resp <- send (getObject (BucketName bucketName) fileName)
    sinkBody (resp ^. gorsBody) sinkLazy

where sinkBody :: MonadIO m => RsBody -> ConduitM ByteString Void (ResourceT IO) a -> m a. In order to run in constant memory, I thought that sinkLazy is a good option for getting a value out of the conduit stream.

After this, I would like to save the lazy bytestring of data (S3 file) into a local file, for which I use this code:

-- fetch stream of data from S3
bytestream <- liftIO $ AWS.runResourceT $ runAwsT awsEnv $ getFile serviceBucket key

-- create a file
liftIO $ writeFile filePath  ""

-- write content of stream into the file (strict version), keeps data in memory...
liftIO $ runConduitRes $ yield bytestream .| mapC B.toStrict .| sinkFile filePath

But this code has the flaw that I need to "realise" all the lazy bytestring in memory, which means that it cannot run in constant space.

  • Is there any way that I can use conduit to yield a lazy bytestring and save it into a file in constant memory?

  • or, any other approach that does not use the sinkLazy and solves the problem of saving into a file running in constant space?

EDIT

I also tested writing the lazy bytestream directly to a file, as follows, but this consumes about 2 times the file size in memory. (The writeFile is from Data.ByteString.Lazy).

bytestream <- liftIO $ AWS.runResourceT $ runAwsT awsEnv $ getFile serviceBucket key
writeFile filename bytestream

CodePudding user response:

Well, the purpose of a streaming library like conduit is to realize some of the benefits of lazy data structures and actions (lazy ByteStrings, lazy I/O, etc.) while better controlling memory usage. The purpose of the sinkLazy function is to take data out of the conduit ecosystem with its well controlled memory footprint and back into the wild West of lazy objects with associated space leaks. So, that's your problem right there.

Rather than sink the stream out of conduit and into a lazy ByteString, you probably want to keep the data in conduit and sink the stream directly into the file, using something like sinkFile. I don't have an AWS test program up and running, but the following type checks and probably does what you want:

import Conduit
import Control.Lens
import Network.AWS
import Network.AWS.S3

getFile bucketName fileName outputFileName = do
    resp <- send (getObject (BucketName bucketName) fileName)
    sinkBody (resp ^. gorsBody) (sinkFile outputFileName)
  • Related