Home > Enterprise >  Using javascript to select values in a JSON object within a script tag in an HTML email file
Using javascript to select values in a JSON object within a script tag in an HTML email file

Time:01-19

I need to pull values from a JSON object that's within a script tag in an HTML file. The HTML is actually an email (.eml) file.

I am using node's "fs" module to read the file and that works fine. And, generally, I know how to select HTML elements (using document.getElementById, innerHTML, etc) and how to work my way through JSON object hierarchies to select values (using JSON.parse and dot notation, etc). But, I'm not sure how to go about selecting values from within code like this.

X-Account-Key: account31
X-UIDL: 00001b5f073425
X-Mozilla-Status: 0000
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
... more email header info ...
<html lang=3D"en-US"> <head> </head> <body> <div>  <script data-scope=3D"in=
boxmarkup" type=3D"application/json">{
  "api_version": "1.0",
  "publisher": {
    "api_key": "67892787u2cfedea31b225240gg3423t9",
    "name": "Google Alerts"
  },
  "cards": [ {
    "title": "Google Alert - \"search keywords\"",
    "subtitle": "Highlights from the latest email",
    "actions":
... and so on with JSON object, then closing script tag...
... email body wrapped in DIV tag ...

What if I want to grab publisher.name or any other property's value from this code?

Any and all pointers appreciated.

CodePudding user response:

You'll need to do these steps:

  1. Read the email file (you're already doing that)
  2. Parse the email file and get the HTML body from it
  3. Parse the DOM defined by that HTML
  4. Select the script element
  5. Get its text content
  6. Parse it via JSON.parse
  7. Access the property from the resulting object

You're already reading the file, but just for completeness, here's an example reading it via the fs/promises module's readFile:

import fs from "fs/promises";
//...
const mailText = await fs.readFile("./test.eml");

Then we need to parse it. As you mentioned in a comment, there's a mailparser npm module that does just that:

import { simpleParser } from "mailparser";
// ...
const email = await simpleParser(mailText);

Then we need to get the HTML body and parse it. There are several DOM parsers for Node.js; here I'm using jsdom:

import { JSDOM } from "jsdom";
// ...
const dom = new JSDOM(email.html);

Then we can use querySelector on dom.window.document to select the script element:

const script = dom.window.document.querySelector("script[type='application/json']");

If there are several, you may need to add more attributes to narrow it down, for instance:

const script = dom.window.document.querySelector("script[type='application/json'][data-scope='data-scope='inboxmarkup']");

Once you have the script element, you can access its text content via .textContent.

Once you have the text, you can parse it with JSON.parse.

Once you have the object, obj.publisher.name should give you the value you're looking for.

So:

import fs from "fs/promises";
import { simpleParser } from "mailparser";
import { JSDOM } from "jsdom";

const mailText = await fs.readFile(/*...your email file name...*/);
const email = await simpleParser(mailText);
const dom = new JSDOM(email.html);
const script = dom.window.document.querySelector("script[type='application/json']");
const json = script.textContent;
const obj = JSON.parse(json);
const name = obj.publisher.name;
console.log(name); // "Google Alerts"
  • Related