Home > Software design >  How can you exclude the pseudo element ­ when collecting text using textContent?
How can you exclude the pseudo element ­ when collecting text using textContent?

Time:01-04

I collect text from an HTML file using the textContent method. I beliefe that the pseudo element ­ is copied as well since I cannot replace words that contain this element. All words that contain ­ (which is not visible) cannot be replaced with the actual word. I tried to first replace %shy; using .replace((­/g, "") but it will still not work.

Example:

I cannot replace "efter­som" using .replace(/eftersom/g, "???") As said the ­ element is not visible after collecting it with .textContent, but it seems to be there.

I tried multiple regular expressions like:

.replace(new RegExp(`(\\W)(${firstWord.replace(/­/gi, "")})(\\W)`, "gi"), "$1???$3")

where firstWord is a variable.

CodePudding user response:

Try this out and see if it works - this should remove all the ­s on your page:

console.log(document.body.innerHTML.replace(/\u00AD/g, ''));

This works by by searching for the Unicode character U 00AD.

CodePudding user response:

If the previous answer didn't work try using this one, which includes the &shy and the decimal version of the soft-hyphen (&#173).

.replace(/(\­|­|­)/gi, "");

This have been answered before in this question. Remove ­ (soft hyphen) entity from element

  • Related