Home > Back-end >  Parsing a stream that returns multiple JSON objects, how to find last object
Parsing a stream that returns multiple JSON objects, how to find last object

Time:08-31

In a node.js application, I am consuming an API which returns multiple concatenated JSON objects. An example string that I have seen was this (pretty printed):

{
  "text": "Hey"
}
{
  "entities": {},
  "intents": [],
  "speech": {
    "confidence": 0.5603,
    "tokens": [
      {
        "end": 360,
        "start": 0,
        "token": "Hey"
      }
    ]
  },
  "text": "Hey",
  "traits": {}
}
{
  "entities": {},
  "intents": [],
  "is_final": true,
  "speech": {
    "confidence": 0.5603,
    "tokens": [
      {
        "end": 360,
        "start": 0,
        "token": "Hey"
      }
    ]
  },
  "text": "Hey",
  "traits": {}
}

I want to extract the last object, parsed as JSON. To be precise, in the example stream, I want to get the following object:

{
  "entities": {},
  "intents": [],
  "is_final": true,
  "speech": {
    "confidence": 0.5603,
    "tokens": [
      {
        "end": 360,
        "start": 0,
        "token": "Hey"
      }
    ]
  },
  "text": "Hey",
  "traits": {}
}

To reiterate for clarity: I don't want to merge the objects. I want to find the last object in this stream and return only that object. I also do not have an array of objects; I am receiving a string that is composed of multiple concatenated JSON objects. Providing a method to convert this string to an array of JSON objects is an acceptable solution.

I have searched for a prebuilt solution, but could not find one that works for my needs (I have found multiple recommendations to read the stream line-by-line and parse the JSON object on each row, but as the API does not return data in that format, this will not work).

Are there any solutions other than creating a JSON parser?

CodePudding user response:

You can iterate the brackets and get the last Object Position by doing it,

something on the lines of:

    const data = `{
                    "text": "Hey"
                  }
                  {
                    "entities": {},
                    "intents": [],
                    "speech": {
                      "confidence": 0.5603,
                      "tokens": [
                        {
                          "end": 360,
                          "start": 0,
                          "token": "Hey"
                        }
                      ]
                    },
                    "text": "Hey",
                    "traits": {}
                  }
                  {
                    "entities": {},
                    "intents": [],
                    "is_final": true,
                    "speech": {
                      "confidence": 0.5603,
                      "tokens": [
                        {
                          "end": 360,
                          "start": 0,
                          "token": "Hey"
                        }
                      ]
                    },
                    "text": "Hey",
                    "traits": {}
                  }`;

    let bracketCount = 0;
    let lastOpenBracketPosition = 0;
    let lastCloseBracketPosition = 0;

    for(let i =0; i < data.length; i  ){
        const char = data.charAt(i);
        if(char === '{'){
            if(bracketCount === 0) lastOpenBracketPosition = i;
            bracketCount  ;
        }
        if(char === '}'){
            if(bracketCount === 1) lastCloseBracketPosition = i;
            bracketCount -= 1;
        }
    }

    let lastObject = data.substr(lastOpenBracketPosition,lastCloseBracketPosition)

CodePudding user response:

Building on @Mathhustt098's answer, I built a very simple character-based parser. There were 4 things to consider:

  1. Curly braces inside of strings do not match with other curly braces
  2. Quotation marks that are escaped with backslash must be ignored
  3. Backslash is not a valid character outside strings
  4. Curly braces need to be checked to make sure they are balanced

With these in mind, I wrote the following code:

function parseMultijson(multijson) {
    if (typeof(multijson) !== 'string') {
        return multijson; // Assume is already parsed as JSON
    }
    let isEscaped = false;
    let lastRootBracket = 0;
    let inString = false;
    let bracketDepth = 0;
    let char;
    for (let i = 0; i < multijson.length; i  ) {
        if (isEscaped) {
            isEscaped = false;
            continue;
        }
        char = multijson.charAt(i);

        if (char === "\\") {
            if (inString) {
                isEscaped = true;
            } else {
                throw "Invalid json: backslash outside string"
            }
        } else if (char === "\"") {
            inString = !inString;
        } else if (!inString) { 
            if (char === "{") {
                if (bracketDepth === 0) {
                    lastRootBracket = i;
                }
                bracketDepth  = 1;
            } else if (char === "}") {
                bracketDepth -= 1;
                if (bracketDepth < 0) {
                    throw "Invalid JSON: unbalanced brackets"
                }
            }
        }
    }

    if (bracketDepth !== 0) {
        throw "Invalid JSON: unbalanced brackets"
    }
    
    return JSON.parse(multijson.substring(lastRootBracket));
}
  • Related