Home > OS >  How to use regex (regular expressions) in Notepad to remove all HTML and JSON code that does not c
How to use regex (regular expressions) in Notepad to remove all HTML and JSON code that does not c

Time:09-27

Using regular expressions (in Notepad ), I want to find all JSON sections that contain the string foo. Note that the JSON just happens to be embedded within a limited set of HTML source code which is loaded into Notepad .

I've written the following regex to accomplish this task:

({[^}]*foo[^}]*})

This works as expected in all the input that is possible.

I want to improve my workflow, so instead of just finding all such JSON sections, I want to write a regex to remove all the HTML & JSON that does not match this expression. The result will be only JSON sections that contain foo.

I tried using the Notepad regex Replace functionality with this find expression:

(?:({[^}]*?foo[^}]*?})|.)

and this replace expression:

$1\n\n$2\n\n$3\n\n$4\n\n$5\n\n$6\n\n$7\n\n$8\n\n$9\n\n

This successfully works for the last occurrence of foo within the JSON, but does not find the rest of the occurrences.

How can I improve my code to find all the occurrences?

Here is a simplified minimal example of input and desired output. I hope I haven't simplified it too much for it to be useful:

Simplified input:

<!DOCTYPE html>
  <html>
    <div dat="{example foo1}"> </div>
    <div dat="{example bar}"> </div>
    <div dat="{example foo2}"> </div>
  </html>

Desired output:

{example foo1}

{example foo2}

CodePudding user response:

The comment section was full tried to suggested a code here,
Let me know if this is a bit close to your intended result,

Find: ^[\S\s] ?({.*?foo\d})|.
Replace all: $1

CodePudding user response:

You can use

{[^}]*foo[^}]*}|((?s:.))

Replace with (?1:$0\n). Details:

  • {[^}]*foo[^}]*} - {, zero or more chars other than }, foo, zero or more chars other than } and then a }
  • | - or
  • ((?s:.)) - Capturing group 1: any one char ((?s:...) is an inline modifier group where . matches all chars including line break chars, same as if you enabled . matches newline option).

The (?1:$0\n) replacement pattern replaces with an empty string if Group 1 was matched, else the replacement is the match text a newline.

See the demo and search and replace dialog settings:

enter image description here

  • Related