I'm parsing a Uint8 array that is an HTML document. It contains a script tag which in turn contains JSON data that I would like to parse.
I first converted the array to text:
data = Buffer.from(str).toString('utf8')
I then searched for the script tag, and extracted the string containing the JSON:
... {\"phrase\":\"Go to \"California\"\",\"color\":\"red\",\"html\":\"<div class=\"myclass\">Ok</div>\"} ...
I then did a replace to clean it up.
data = data.replace(/\\"/g, "\"").replace(/\\/g, "").
{"phrase":"Go to "California"","color":"red","html":"<div >Ok</div>"}
I tried to parse using JSON.parse() and got an error because the attributes contain quotes. Is there a way to process this further using a regex ? Or perhaps a library? I am working with Cheerio, so can use that if helpful.
CodePudding user response:
The escape characters are necessary if you want to parse the JSON. The embedded quotes would need to be double escaped, so the extracted text isn't even valid JSON.
"{\"phrase\":\"Go to \\\"California\\\"\",\"color\":\"red\",\"html\":\"<div class=\\\"myclass\\\">Ok</div>\"}"
or, using single quotes:
'{"phrase":"Go to \\"California\\"","color":"red","html":"<div class=\\"myclass\\">Ok</div>"}'
CodePudding user response:
Thanks.
After some more tinkering around, I realized that I should have encoded the data to Uint8 at the source (a Lambda function) before transmitting it for further processing. So now, I have:
- Text
- Encoded text to Uint8
- Return from Lambda function.
- Decode from Uint8 to text
- Process readily as no escape characters.
Before, I was skipping step 2. And so Lambda was encoded the text however it does by default.