I have a huge CSV (1,5GB) which I need to process line by line and construct 2 xml files. When I run the processing alone my program takes about 4 minutes to execute, if I also generate my xml files it takes over 2.5 hours to generate two 9GB xml files.
My code for writing the xml files is really simple, I use fs.appendFileSync
to write my opening/closing xml tags and the text inside them. To sanitize the data I run this function on the text inside the xml tags.
function() {
return this.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, """)
.replace(/'/g, "'");
};
Is there something I could optimize to reduce the execution time?
CodePudding user response:
fs.appendFileSync()
is a relatively expensive operation: it opens the file, appends the data, then closes it again.
It'll be faster to use a writeable stream:
const fs = require('node:fs');
// create the stream
const stream = fs.createWriteStream('output.xml');
// then for each chunk of XML
stream.write(yourXML);
// when done, end the stream to close the file
stream.end();
CodePudding user response:
I drastically reduced the execution time (to 30 minutes) by doing 2 things.
- Setting the ENV variable UV_THREADPOOL_SIZE=64
- Buffering my writes to the xml file (I flush the buffer to the file after 20,000 closed tags)