A regex that will capture all text after a colon but ends with a comma

I am trying to redact something that log4j is overriding in general. To do this, I am trying to change a regex to ensure it captures what I need...an example...

"definition":{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"*** REDACTED ***"},{"dataType":"INT","name":"column_b","description":"description"}]}}}, "some other stuff": ["SOME_STUFF"], etc.

Hoping to capture just...

{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"*** REDACTED ***"},{"dataType":"INT","name":"column_b","description":"description"}]}}}

I have this...

(?<=("definition":{))(\\.|[^\\])*?(?=}})

Where if I keep adding a } at the end it will keep highlighting what I need. The problem is that there is no set number of nested elements in the list.

Is there anyway to adjust the above so I can capture everything within the outer brackets?

CodePudding user response：

If you don't have other brackets after the last one you're trying to match, this regex should work for you:

(?<=(\"definition\":))\{.*\}(?=\})

The main difference is moving the brackets from the lookarounds to the matching part.

Check the demo here.

CodePudding user response：

This regex should work for you if you cannot use a proper JSON parser:

(?<=\"definition\":). ?}(?=,\h*\")

RegEx Demo

Breakdown:

(?<=\"definition\":): Lookbehind condition to make sure we have "definition": before the current position
. ?}: Match 1 of any characters ending with }
(?=,\h*\"): Lookahead to assert that we have a comma then 0 or more spaces followed by a " ahead of the current position