i try to get everything that doesn't fit in hooks with regex in OpenRefine but i'm stuck.
i have done this :
/^([a-z] )\[[a-z] \]([a-z] )/
but I can't "repeat" my rule so that it applies in all these cases.
here are my test character strings :
abcd[zz]efgh[zz]ijkl[zz]
# i want: abcd efgh ijkl
abcd[zz]efgh[zz]ijkl
# i want: abcd efgh ijkl
abcd[zz]efgh
# i want: abcd efgh
abcd[zz]
# i want: abcd
[zz]abcd
# i want: abcd
Thank you in advance
CodePudding user response:
You can extract strings that do not contain ]
and [
that are not immediately followed with any chars other than square brackets and then a ]
char:
(?=([^\]\[] ))\1(?![\]\[]*])
The trick is also to use an atomic first pattern so as to stop backtracking to return a part of a match. In JavaScript regex, the atomic pattern can be defined with a positive lookahead capturing a pattern, and then using a backreference to the matched text right after.
Details:
(?=([^\]\[] ))
- a positive lookahead that captures into Group 1 one or more chars other than[
and]
\1
- the backreference to Group 1 that consumes the text captured into Group 1(?![\]\[]*])
- a negative lookahead that fails the match if, immediately to the right, there are zero or more chars other than[
and]
and then a]
.