I have a nodeJS script that reads HTML from a file as string. I would like to extract some data from it. My string (it is a string not HTML) is as following:
<tr><td style="text-align: center;">Initial Filing</td></tr>
<tr><td>Debtor</td></tr>
<tr><td class="dName">PO</td></tr>
<tr><td class="dAddress">CLACKAMAS OR 97015</td></tr>
<tr><td>Secured Party</td></tr>
<tr><td class="spName">AS</td></tr>
<tr><td class="spAddress">SPRINGFIELD IL 62708</td></tr>
<tr><td>Debtor</td></tr>
<tr><td class="dName">ONE</td></tr>
<tr><td class="dAddress">CLACKAMAS OR 97015</td></tr>
<tr><td>Secured Party</td></tr>
<tr><td class="spName">ANY</td></tr>
<tr><td class="spAddress">SPRINGFIELD IL 62708</td></tr>
The JavaScrit code I'm using is:
fs.readFile('file.txt', 'utf8', function (err, data) {
if (err) {
console.log("Error reading file.txt", err);
process.exit(1);
}
var cleanedHtml = /<tr><td>Debtor<\/td><\/tr>(.*?)<tr><td>Secured Party<\/td><\/tr>/g.exec(html);
console.log(cleanedHtml[1]);
});
It returns to me this:
return cleanedHtml[1];
^
TypeError: Cannot read property '1' of null
Is there any issue with my regex? Also, how can I have an end result like this:
PO
CLACKAMAS OR 97015
AS
SPRINGFIELD IL 62708
ONE
CLACKAMAS OR 97015
ANY
SPRINGFIELD IL 62708
Thanks.
CodePudding user response:
If you make sure that the tr
elements are inside <table></table>
then you can parse the string using DOMParser()
after reading the file:
Demo:
var strHtml = `
<table>
<tr><td style="text-align: center;">Initial Filing</td></tr>
<tr><td>Debtor</td></tr>
<tr><td >PO</td></tr>
<tr><td >CLACKAMAS OR 97015</td></tr>
<tr><td>Secured Party</td></tr>
<tr><td >AS</td></tr>
<tr><td >SPRINGFIELD IL 62708</td></tr>
<tr><td>Debtor</td></tr>
<tr><td >ONE</td></tr>
<tr><td >CLACKAMAS OR 97015</td></tr>
<tr><td>Secured Party</td></tr>
<tr><td >ANY</td></tr>
<tr><td >SPRINGFIELD IL 62708</td></tr>
</table>
`
var doc = new DOMParser().parseFromString(strHtml, 'text/html');
var els = doc.querySelectorAll('.dName,.spName,.dAddress,.spAddress');
els.forEach((el) => {
console.log(el.textContent);
});
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
CodePudding user response:
Should there not be brackets after console.log? Is the cleanedHtml a list with more than one element? Otherwise there is no cleanedHtml[1]