Home > Net >  How to read JSONL line-by-line after hitting url in Node.JS?
How to read JSONL line-by-line after hitting url in Node.JS?

Time:10-08

From the Shopify API, I receive a link to a large amount of JSONL. Using NodeJS, I need to read this data line-by-line, as loading it all at once would use lots of memory. When I hit the JSONL url from the web browser, it automatically downloads the JSONL file to my downloads folder.

Example of JSONL:

{"id":"gid:\/\/shopify\/Customer\/6478758936817","firstName":"Joe"}
{"id":"gid:\/\/shopify\/Order\/5044232028401","name":"#1001","createdAt":"2022-09-16T16:30:50Z","__parentId":"gid:\/\/shopify\/Customer\/6478758936817"}
{"id":"gid:\/\/shopify\/Order\/5044244480241","name":"#1003","createdAt":"2022-09-16T16:37:27Z","__parentId":"gid:\/\/shopify\/Customer\/6478758936817"}
{"id":"gid:\/\/shopify\/Order\/5057425703153","name":"#1006","createdAt":"2022-09-27T17:24:39Z","__parentId":"gid:\/\/shopify\/Customer\/6478758936817"}
{"id":"gid:\/\/shopify\/Customer\/6478771093745","firstName":"John"}
{"id":"gid:\/\/shopify\/Customer\/6478771126513","firstName":"Jane"}

I'm unsure how to process this data in NodeJS. Do I need to hit the url, download all of the data and store it in a temporary file, then process the data line-by-line? Or can I read the data line-by-line directly after hitting the url (via some sort of stream?) and process it without storing it in a temporary file on the server?

(The JSONL comes from https://storage.googleapis.com/ if that helps.)

Thanks.

CodePudding user response:

using axios you can set the response to be a stream, and then using a buildin readline module, you can process your data line by line.

import axios from 'axios'
import { createInterface } from 'node:readline'

const response = await axios.get('https://raw.githubusercontent.com/zaibacu/thesaurus/master/en_thesaurus.jsonl', {
  responseType: 'stream'
})

const rl = createInterface({
  input: response.data
})

for await (const line of rl) {
  // do something with the current line
  const { word, synonyms } = JSON.parse(line)
  console.log('word, synonyms: ', word, synonyms);
}

testing this there is barely any memory usage

CodePudding user response:

You can easily run a great CLI tool called jq. Magic.

Unlike tying yourself to browser code, this code can be run in any way you need to parse JSONL.

   jq -cs '.' doodoo.myshopify.com.export.jsonl > out.json

Would take my nicely just downloaded bulk file from a query and give me a very nice pure JSON data structure to play with, or save.

  • Related