Home > Enterprise >  Node Read Streams - How can I limit the number of open files?
Node Read Streams - How can I limit the number of open files?

Time:01-02

I'm running into AggregateError: EMFILE: too many open files while streaming multiple files.

Machine Details: MacOS Monterey, MacBook Pro (14-inch, 2021), Chip Apple M1 Pro, Memory 16GB, Node v16.13.0

I've tried increasing the limits with no luck. Ideally I would like to be able to set the limit of the number of files open at one time or resolve by closing files as soon as they have been used.

Code below. I've tried to remove the unrelated code and replace it with '//...'.

const MultiStream = require('multistream');
const fs = require('fs-extra'); // Also tried graceful-fs and the standard fs
const { fdir } = require("fdir");
// Also have a require for the bz2 and split2 functions but editing from phone right now

//...

let files = [];

//...

(async() => {

  const crawler = await new fdir()
  .filter((path, isDirectory) => path.endsWith(".bz2"))
  .withFullPaths()
  .crawl("Dir/Sub Dir")
  .withPromise();

  for(const file of crawler){
    files = [...files, fs.createReadStream(file)]
  }

  multi = await new MultiStream(files)
    // Unzip
    .pipe(bz2())
    // Create chunks from lines
    .pipe(split2())
    .on('data', function (obj) {
      // Code to filter data and extract what I need
      //...
    })
    .on("error", function(error) {
      // Handling parsing errors
      //...
    })
    .on('end', function(error) {
      // Output results
      //...
    })

})();

CodePudding user response:

To prevent pre-opening a filehandle for every single file in your array, you want to only open the files upon demand when it's that particular file's turn to be streamed. And, you can do that with multi-stream.

Per the multi-stream doc, you can lazily create the readStreams by changing this:

  for(const file of crawler){
    files = [...files, fs.createReadStream(file)]
  }

to this:

  let files = crawler.map((f) => {
      return function() {
          return fs.createReadStream(f);
      }
  });

CodePudding user response:

After reading over the npm page for multistream I think I have found something that will help. I have also edited where you are adding the stream to the files array as I don't see a need to instantiate a new array and spread existing elements like you are doing.

To lazily create the streams, wrap them in a function:

    var streams = [
      fs.createReadStream(__dirname   '/numbers/1.txt'),
      function () { // will be executed when the stream is active
        return fs.createReadStream(__dirname   '/numbers/2.txt')
      },
      function () { // same
        return fs.createReadStream(__dirname   '/numbers/3.txt')
      }
    ]
    
    new MultiStream(streams).pipe(process.stdout) // => 123 ```

With that we can update your logic to include this functionality by simply wrapping the readStreams in functions, this way the streams will not be created until they are needed. This will prevent you from having too many open at once. We can do this by simply updating your file loop:

for(const file of crawler){
    files.push(function() {
        return fs.createReadStream(file)
    })
}
  • Related