Home > front end >  Extract and Parse Huge incomplete JSON in NodeJS
Extract and Parse Huge incomplete JSON in NodeJS

Time:07-25

Imagine a scenario where I have a huge JSON file on a Github gist. That JSON is an array of object and it has 30k lines. Now I want to perform ETL(Extract, Transform, Load) those data directly from Github gist to my database. Unfortunately, the last object of that JSON is incomplete and I don't have any control in the external data source. Which means, in a simple demonstration I'm getting the data like this:

[{ "name": { "first": "foo", "last": "bar" } }, { "name": { "first": "ind", "last": "go

What is the best practice or how can I extract such a huge JSON file and parse it correctly in NodeJs?

I've tried to parse using regular JSON.parse() and a npm package named partial-json-parser but it was no help.

CodePudding user response:

I think you need to fix the JSON structure first. Just try this approach:

import untruncateJson from "untruncate-json";

const str = `[{ "name": { "first": "foo", "last": "bar" } }, { "name": { 
"first": "ind", "last": "go`;

const fixJson = untruncateJson.default;

const json = fixJson(str);

console.log(json);

CodePudding user response:

I think you can parse data only when you fixed the structure

example:

const result = 
    [
      {
        name: "john",
        gender: "male",
      },
      {
        name: "doe",

You can fix the data with add some missing brackets

result  = `
    gender: "male"
  }
]
`
  • Related