UINT8 Array to String without escape characters-CodePudding

I'm parsing a Uint8 array that is an HTML document. It contains a script tag which in turn contains JSON data that I would like to parse.

I first converted the array to text:

data = Buffer.from(str).toString('utf8')

I then searched for the script tag, and extracted the string containing the JSON:

... {\"phrase\":\"Go to \"California\"\",\"color\":\"red\",\"html\":\"<div class=\"myclass\">Ok</div>\"} ...

I then did a replace to clean it up.

data = data.replace(/\\"/g, "\"").replace(/\\/g, "").

{"phrase":"Go to "California"","color":"red","html":"<div >Ok</div>"}

I tried to parse using JSON.parse() and got an error because the attributes contain quotes. Is there a way to process this further using a regex ? Or perhaps a library? I am working with Cheerio, so can use that if helpful.

CodePudding user response：

The escape characters are necessary if you want to parse the JSON. The embedded quotes would need to be double escaped, so the extracted text isn't even valid JSON.

"{\"phrase\":\"Go to \\\"California\\\"\",\"color\":\"red\",\"html\":\"<div class=\\\"myclass\\\">Ok</div>\"}"

or, using single quotes:

'{"phrase":"Go to \\"California\\"","color":"red","html":"<div class=\\"myclass\\">Ok</div>"}'

CodePudding user response：

Thanks.

After some more tinkering around, I realized that I should have encoded the data to Uint8 at the source (a Lambda function) before transmitting it for further processing. So now, I have:

Text
Encoded text to Uint8
Return from Lambda function.
Decode from Uint8 to text
Process readily as no escape characters.

Before, I was skipping step 2. And so Lambda was encoded the text however it does by default.