How can you exclude the pseudo element  when collecting text using textContent?-CodePudding

I collect text from an HTML file using the textContent method. I beliefe that the pseudo element  is copied as well since I cannot replace words that contain this element. All words that contain  (which is not visible) cannot be replaced with the actual word. I tried to first replace %shy; using .replace((/g, "") but it will still not work.

Example:

I cannot replace "eftersom" using .replace(/eftersom/g, "???") As said the element is not visible after collecting it with .textContent, but it seems to be there.

I tried multiple regular expressions like:

.replace(new RegExp(`(\\W)(${firstWord.replace(/&shy;/gi, "")})(\\W)`, "gi"), "$1???$3")

where firstWord is a variable.

CodePudding user response：

Try this out and see if it works - this should remove all the s on your page:

console.log(document.body.innerHTML.replace(/\u00AD/g, ''));

This works by by searching for the Unicode character U 00AD.

CodePudding user response：

If the previous answer didn't work try using this one, which includes the &shy and the decimal version of the soft-hyphen (&#173).

.replace(/(\&shy;||&#173;)/gi, "");

This have been answered before in this question. Remove  (soft hyphen) entity from element