Home > Blockchain >  A regex that will capture all text after a colon but ends with a comma - within can be nested bracke
A regex that will capture all text after a colon but ends with a comma - within can be nested bracke

Time:12-15

I am trying to redact something that log4j is overriding in general. To do this, I am trying to change a regex to ensure it captures what I need...an example...

"definition":{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"*** REDACTED ***"},{"dataType":"INT","name":"column_b","description":"description"}]}}}, "some other stuff": ["SOME_STUFF"], etc.

Hoping to capture just...

{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"*** REDACTED ***"},{"dataType":"INT","name":"column_b","description":"description"}]}}}

I have this...

(?<=("definition":{))(\\.|[^\\])*?(?=}})

Where if I keep adding a } at the end it will keep highlighting what I need. The problem is that there is no set number of nested elements in the list.

Is there anyway to adjust the above so I can capture everything within the outer brackets?

CodePudding user response:

If you don't have other brackets after the last one you're trying to match, this regex should work for you:

(?<=(\"definition\":))\{.*\}(?=\})

The main difference is moving the brackets from the lookarounds to the matching part.

Check the demo here.

CodePudding user response:

This regex should work for you if you cannot use a proper JSON parser:

(?<=\"definition\":). ?}(?=,\h*\")

RegEx Demo

Breakdown:

  • (?<=\"definition\":): Lookbehind condition to make sure we have "definition": before the current position
  • . ?}: Match 1 of any characters ending with }
  • (?=,\h*\"): Lookahead to assert that we have a comma then 0 or more spaces followed by a " ahead of the current position
  • Related