Home > Enterprise >  Regex just keep the content between tags but select everything
Regex just keep the content between tags but select everything

Time:12-03

So in VS Code I used this <script>(.|\n)*?<\/script> regex pattern to select everything between <script> tags (including tags) and it worked great. (See the example below)

<html>
<p>dsldsdsd</p>
<p>dsldsdsd</p>
<p>dsldsdsd</p>

*<script>
Some code
</script>*

 *<script>
Some code
</script>*

<p>dsldsdsd<p>
<p>dsldsdsd<p>

</html>

So with this <script>(.|\n)*?<\/script> everything between * * gets selected.

Now what I actually want to do is do the opposite of what I've shown you. For example, like this. Select everything else but leave inside<script> </script>tags. (Along with the tag)

*<html>
<p>dsldsdsd</p>
<p>dsldsdsd</p>
<p>dsldsdsd</p>*

<script>
Some code
</script>

<script>
Some code
</script>

*<p>dsldsdsd</p>
<p>dsldsdsd</p>

</html>*

So I went through some regex documents online and I tried the following regex to select everything else (and keep everything between <script> tags)

^((?!<script>(.|\n)*?<\/script>).)*$

But this just keeps the word <script>. What have I done wrong?

In short, what I'm trying to do is negate the <script>(.|\n)*?<\/script> expression.

Any help is appreciated. Thanks.

CodePudding user response:

An idea is to match what you don't want but capture what you need to \1

<script>[\s\S]*?<\/script>|((?:<(?!script)|[^<])[\s\S]*?)(?=<script|$)

See this demo at regex101

To not skip over an opening <script in the alternation either match a character, that is not < or match a < which is not followed by script by use of a lookahead until <script occurs or $ end.

  • Related