I am trying to redact something that log4j is overriding in general. To do this, I am trying to change a regex to ensure it captures what I need...an example...
"definition":{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"*** REDACTED ***"},{"dataType":"INT","name":"column_b","description":"description"}]}}}, "some other stuff": ["SOME_STUFF"], etc.
Hoping to capture just...
{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"*** REDACTED ***"},{"dataType":"INT","name":"column_b","description":"description"}]}}}
I have this...
(?<=("definition":{))(\\.|[^\\])*?(?=}})
Where if I keep adding a } at the end it will keep highlighting what I need. The problem is that there is no set number of nested elements in the list.
Is there anyway to adjust the above so I can capture everything within the outer brackets?
CodePudding user response:
If you don't have other brackets after the last one you're trying to match, this regex should work for you:
(?<=(\"definition\":))\{.*\}(?=\})
The main difference is moving the brackets from the lookarounds to the matching part.
Check the demo here.
CodePudding user response:
This regex should work for you if you cannot use a proper JSON parser:
(?<=\"definition\":). ?}(?=,\h*\")
Breakdown:
(?<=\"definition\":)
: Lookbehind condition to make sure we have"definition":
before the current position. ?}
: Match 1 of any characters ending with}
(?=,\h*\")
: Lookahead to assert that we have a comma then 0 or more spaces followed by a"
ahead of the current position