I am trying to extract [[String]]
with regular expression. Notice how a bracket opens [ and it needs to close ]. So you would receive the following matches:
[[String]]
[String]
String
If I use \[[^\]] \]
it will just find the first closing bracket it comes across without taking into consideration that a new one has opened in between and it needs the second close. Is this at all possible with regular expression?
Note: This type can either be String, [String] or [[String]] so you don't know upfront how many brackets there will be.
CodePudding user response:
You can use the following PCRE compliant regex:
(?=((\[(?:\w |(?2))*])|\b\w ))
See the regex demo. Details:
(?=
- start of a positive lookahead (necessary to match overlapping strings):(
- start of Capturing group 1 (it will hold the "matches"):(\[(?:\w |(?2))*])
- Group 2 (technical, used for recursing):[
, then zero or more occurrences of one or more word chars or the whole Group 2 pattern recursed, and then a]
char|
- or\b\w
- a word boundary (necessary since all overlapping matches are being searched for) and one or more word chars
)
- end of Group 1
)
- end of the lookahead.
See the PHP demo:
$s = "[[String]]";
if (preg_match_all('~(?=((\[(?:\w |(?2))*])|\b\w ))~', $s, $m)){
print_r($m[1]);
}
Output:
Array
(
[0] => [[String]]
[1] => [String]
[2] => String
)