Home > Net >  UINT8 Array to String without escape characters
UINT8 Array to String without escape characters

Time:11-02

I'm parsing a Uint8 array that is an HTML document. It contains a script tag which in turn contains JSON data that I would like to parse.

I first converted the array to text:

data = Buffer.from(str).toString('utf8')

I then searched for the script tag, and extracted the string containing the JSON:

... {\"phrase\":\"Go to \"California\"\",\"color\":\"red\",\"html\":\"<div class=\"myclass\">Ok</div>\"} ...

I then did a replace to clean it up.

data = data.replace(/\\"/g, "\"").replace(/\\/g, "").

{"phrase":"Go to "California"","color":"red","html":"<div >Ok</div>"}

I tried to parse using JSON.parse() and got an error because the attributes contain quotes. Is there a way to process this further using a regex ? Or perhaps a library? I am working with Cheerio, so can use that if helpful.

CodePudding user response:

The escape characters are necessary if you want to parse the JSON. The embedded quotes would need to be double escaped, so the extracted text isn't even valid JSON.

"{\"phrase\":\"Go to \\\"California\\\"\",\"color\":\"red\",\"html\":\"<div class=\\\"myclass\\\">Ok</div>\"}"

or, using single quotes:

'{"phrase":"Go to \\"California\\"","color":"red","html":"<div class=\\"myclass\\">Ok</div>"}'

CodePudding user response:

Thanks.

After some more tinkering around, I realized that I should have encoded the data to Uint8 at the source (a Lambda function) before transmitting it for further processing. So now, I have:

  1. Text
  2. Encoded text to Uint8
  3. Return from Lambda function.
  4. Decode from Uint8 to text
  5. Process readily as no escape characters.

Before, I was skipping step 2. And so Lambda was encoded the text however it does by default.

  • Related