I am building out a series of validation functions for my static site generation function. For one of these validation functions I use a series of .match()
functions to parse HTML header tags. Here is the validation function so far:
// 2c. Check that each header tag's next header tag is one level higher, lower, or the same level
const headers = html.match(/<h[1-6][^>]*>.*<\/h[1-6]>/g)
headers?.forEach((header, index) => {
if (index === headers.length - 1) return
const currentLevel = parseInt(header.match(/<h([1-6])/)[1])
const nextLevel = parseInt(headers[index 1].match(/<h([1-6])/)[1])
if (nextLevel && Math.abs(currentLevel - nextLevel) > 1) {
throw new Error(
`Content for ${directory}/${slug}${contentExtension} has an invalid header hierarchy.`
)
}
})
The same error, "Object is possible null
, occurs with both header.match(/<h([1-6])/)
and headers[index 1].match(/<h([1-6])/)
. However, this validation function works properly with several Jest test cases I have written out. I understand why this error is happening and how to trigger the validation function, but don't understand how to fix this null object issue.
Some of the attempts to fixing this is first a non-null assertion operator with the problematic parts of the code. This didn't resolve the issue and I also tried a typical if
null check for the objects. I also tried checks on the index after .match()
, but I realized very quickly this was outside the scope of the error.
Looking at the "Do any of these posts answer your question" section I tried this answer and received a new error of "Forbidden non-null assertion".
CodePudding user response:
TypeScript is properly warning you that you might not be able to absolutely depend on there always being a match. What if the server that creates the HTML is having issues, and gives you a 404 page instead? Or what if the match fails for some other reason? You should add some logic to not attempt to parse the match result if the match fails.
headers?.forEach((header, index) => {
if (index === headers.length - 1) return
const currentLevelMatch = header.match(/<h([1-6])/);
const nextLevelMatch = headers[index 1].match(/<h([1-6])/);
if (!currentLevelMatch || !nextLevelMatch) {
// you might want to log that this was an unexpected failure
return;
}
const currentLevel = parseInt(currentLevelMatch[1]);
const nextLevel = parseInt(nextLevelMatch[1]);
// etc
A more elegant approach than parsing HTML with regular expressions would be to create a document from the HTML string with DOMParser or jsdom - with those, you'll then be able to navigate around the document and its elements using handy methods like .querySelector
and .nodeName
. One possible approach would be to take the HTML string that one of those methods creates, and proceed to use that (rather than the input, which sounds like it might sometimes have invalid markup).