Home > Back-end >  How to read CSV files in chunks and run function for every chunk in JS/Node
How to read CSV files in chunks and run function for every chunk in JS/Node

Time:12-15

I want to read CSV/TSV files in Node/JS and run a function at certain intervals.

For example, I want to get 10,000 lines in list, then run a function with this 10,000 lines without continue to read the file. When function is done working, continue to next 10,000 lines and do the same until the end.

How can I do that?

I manage to stream the file and also run the function at every 10,000 lines but I guess script runs async so it runs all the function all almost at the same time.

My solution so far

fs.createReadStream(file_path)
  .pipe(csv({ separator: '\t' }))
  .on('data', (row) => push_to_list(row)  // <-- stream should wait for func
  )
  .on('end', () => {
    console.log("DONE");
  })

CodePudding user response:

Not sure if this is ideal but you could accomplish this by pushing the row to an array and then calling the function when it hits 10k rows. At the end if there's anything in the rows array you can call the function with the remaining rows < 10k.

let rows = []
fs.createReadStream(file_path)
  .pipe(csv({ separator: '\t' }))
  .on('data', (row) => {
    rows.push(row)
    if (rows.length === 10000) {
      push_to_list(rows) // <-- stream should wait for func
      rows = []
    }
  }
  )
  .on('end', () => {
    if (rows.length) {
      push_to_list(rows)
    }
    console.log('DONE')
  })

CodePudding user response:

I am thinking something like this:

// getChunk reads the file and pushes things into a lines array
function* getChunk(){
  let lines = []
  lineReader.open('/path/to/file', function(reader) {

    if (reader.hasNextLine()) {
        reader.nextLine(function(line) {
            lines.push(line);
            // when the lines are large enough, pause and emit the results, and reset the array
            if (lines.length == 10000) {
              yield lines;
              lines = []
            }
        });
    }
   }); 

   // last odd sized chunk
   yield lines 
};

// each chunk is an array of 10k lines, because of the generator funciton (function*) we can use iterator syntax
for (const chunk of getChunks()) {
  console.log(o);
}

From an engienering and code structure perspective this leaves a lot of work to be done (don't open the file inside of the generator, maybe parametize the chunk size too, clean up the file handle). but this is a great fit for generator functions

  • Related