Home > Software engineering >  Nodejs how to optimize writing very large xml files?
Nodejs how to optimize writing very large xml files?

Time:07-25

I have a huge CSV (1,5GB) which I need to process line by line and construct 2 xml files. When I run the processing alone my program takes about 4 minutes to execute, if I also generate my xml files it takes over 2.5 hours to generate two 9GB xml files.

My code for writing the xml files is really simple, I use fs.appendFileSync to write my opening/closing xml tags and the text inside them. To sanitize the data I run this function on the text inside the xml tags.

  function() {
    return this.replace(/&/g, "&")
      .replace(/</g, "&lt;")
      .replace(/>/g, "&gt;")
      .replace(/"/g, "&quot;")
      .replace(/'/g, "&apos;");
  };

Is there something I could optimize to reduce the execution time?

CodePudding user response:

fs.appendFileSync() is a relatively expensive operation: it opens the file, appends the data, then closes it again.

It'll be faster to use a writeable stream:

const fs = require('node:fs');

// create the stream
const stream = fs.createWriteStream('output.xml');

// then for each chunk of XML
stream.write(yourXML);

// when done, end the stream to close the file
stream.end();

CodePudding user response:

I drastically reduced the execution time (to 30 minutes) by doing 2 things.

  1. Setting the ENV variable UV_THREADPOOL_SIZE=64
  2. Buffering my writes to the xml file (I flush the buffer to the file after 20,000 closed tags)
  • Related