Home > Software engineering >  How to match all tags containing specific word with regex?
How to match all tags containing specific word with regex?

Time:11-30

I'm trying to find all <divs> in an html containing the same word, it could be either in the or in the id=""

Example:

<div id="chat_widget_th" class="bg-warning checkbox chat_open_ts">...</div>
<div class="bloom chat_inside_th dark_yellow>...</div>
<div id="opened_widget_chat" class="active show">...</div>
<div class="chat_child modal show fade">...</div>

These four <divs> are from different pages.

They all correspond to a chat popup that i need to exclude. All of them contain, in some way, the word "chat".

I need to find all the <divs> (or other tags) that contain the word "chat" and delete them. For this I will use the function

<script>
var regexclass = /]*?chat[^"]*?(?=")/;
var regexid = /id="\K[^"]*?chat[^"]*?(?=")/;
$('#regexclass').remove();
$('#regexid').remove();
</script>

The above function works correctly when it comes to id="", because it finds everything that is enclosed in the quotes of the id attribute, which is unique.

When it comes to a class, on the other hand, the function does not work, because it returns, as I said, everything that is enclosed in quotation.

I.E

"bloom .chat_inside_th .dark_yellow"

while the function would need to at least eliminate the spaces between the different classe.

".bloom.chat_inside_th.dark_yellow"

Is there any way to eliminate these spaces when searching for classes or, better yet, find exclusively the class that contains the word "chat" like "chat_inside_th"?

CodePudding user response:

You can use an attribute selector: https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors

[id*="chat"] will match all id attributes which have chat in them

$("#go").on("click",e => { $('[id*="chat"],[class*="chat"]').remove(); });
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

<div id="chat_widget_th" class="bg-warning checkbox chat_open_ts">one</div>
<div class="bloom chat_inside_th dark_yellow">two</div>
<div id="opened_widget_chat" class="active show">three</div>
<div class="chat_child modal show fade">four</div>

<button id="go">go</button>
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

Regex should not be used to parse HTML.

Instead, you can use a DOMParser to parse the string, from which you can select all elements whose id or class attribute contain 'chat' and remove them.

const str = `<div id="chat_widget_th" >...</div>
<div opened_widget_chat" >...</div>
<div >...</div>`;

const parsed = new DOMParser().parseFromString(str, 'text/html');
parsed.body.querySelectorAll('[id*=chat],[class*=chat]').forEach(e => e.remove())

const res = parsed.body.innerHTML;
console.log(res)
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

  • Related