I need to pull values from a JSON object that's within a script tag in an HTML file. The HTML is actually an email (.eml) file.
I am using node's "fs" module to read the file and that works fine. And, generally, I know how to select HTML elements (using document.getElementById
, innerHTML
, etc) and how to work my way through JSON object hierarchies to select values (using JSON.parse
and dot notation, etc). But, I'm not sure how to go about selecting values from within code like this.
X-Account-Key: account31
X-UIDL: 00001b5f073425
X-Mozilla-Status: 0000
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
... more email header info ...
<html lang=3D"en-US"> <head> </head> <body> <div> <script data-scope=3D"in=
boxmarkup" type=3D"application/json">{
"api_version": "1.0",
"publisher": {
"api_key": "67892787u2cfedea31b225240gg3423t9",
"name": "Google Alerts"
},
"cards": [ {
"title": "Google Alert - \"search keywords\"",
"subtitle": "Highlights from the latest email",
"actions":
... and so on with JSON object, then closing script tag...
... email body wrapped in DIV tag ...
What if I want to grab publisher.name
or any other property's value from this code?
Any and all pointers appreciated.
CodePudding user response:
You'll need to do these steps:
- Read the email file (you're already doing that)
- Parse the email file and get the HTML body from it
- Parse the DOM defined by that HTML
- Select the
script
element - Get its text content
- Parse it via
JSON.parse
- Access the property from the resulting object
You're already reading the file, but just for completeness, here's an example reading it via the fs/promises
module's readFile
:
import fs from "fs/promises";
//...
const mailText = await fs.readFile("./test.eml");
Then we need to parse it. As you mentioned in a comment, there's a mailparser
npm
module that does just that:
import { simpleParser } from "mailparser";
// ...
const email = await simpleParser(mailText);
Then we need to get the HTML body and parse it. There are several DOM parsers for Node.js; here I'm using jsdom
:
import { JSDOM } from "jsdom";
// ...
const dom = new JSDOM(email.html);
Then we can use querySelector
on dom.window.document
to select the script
element:
const script = dom.window.document.querySelector("script[type='application/json']");
If there are several, you may need to add more attributes to narrow it down, for instance:
const script = dom.window.document.querySelector("script[type='application/json'][data-scope='data-scope='inboxmarkup']");
Once you have the script
element, you can access its text content via .textContent
.
Once you have the text, you can parse it with JSON.parse
.
Once you have the object, obj.publisher.name
should give you the value you're looking for.
So:
import fs from "fs/promises";
import { simpleParser } from "mailparser";
import { JSDOM } from "jsdom";
const mailText = await fs.readFile(/*...your email file name...*/);
const email = await simpleParser(mailText);
const dom = new JSDOM(email.html);
const script = dom.window.document.querySelector("script[type='application/json']");
const json = script.textContent;
const obj = JSON.parse(json);
const name = obj.publisher.name;
console.log(name); // "Google Alerts"