I collect text from an HTML file using the textContent
method.
I beliefe that the pseudo element ­
is copied as well since I cannot replace words that contain this element. All words that contain ­
(which is not visible) cannot be replaced with the actual word.
I tried to first replace %shy;
using .replace((­/g, "")
but it will still not work.
Example:
I cannot replace "efter­som"
using .replace(/eftersom/g, "???")
As said the element is not visible after collecting it with .textContent
, but it seems to be there.
I tried multiple regular expressions like:
.replace(new RegExp(`(\\W)(${firstWord.replace(/­/gi, "")})(\\W)`, "gi"), "$1???$3")
where firstWord
is a variable.
CodePudding user response:
Try this out and see if it works - this should remove all the ­
s on your page:
console.log(document.body.innerHTML.replace(/\u00AD/g, ''));
This works by by searching for the Unicode character U 00AD.
CodePudding user response:
If the previous answer didn't work try using this one, which includes the ­ and the decimal version of the soft-hyphen (­).
.replace(/(\­||­)/gi, "");
This have been answered before in this question. Remove ­ (soft hyphen) entity from element