Home > Blockchain >  regex: get everything that doesn't fit in hooks
regex: get everything that doesn't fit in hooks

Time:09-13

i try to get everything that doesn't fit in hooks with regex in OpenRefine but i'm stuck.

i have done this :

/^([a-z] )\[[a-z] \]([a-z] )/

but I can't "repeat" my rule so that it applies in all these cases.

here are my test character strings :

abcd[zz]efgh[zz]ijkl[zz] 
# i want: abcd efgh ijkl

abcd[zz]efgh[zz]ijkl
# i want: abcd efgh ijkl

abcd[zz]efgh
# i want: abcd efgh

abcd[zz]
# i want: abcd

[zz]abcd
# i want: abcd

Thank you in advance

CodePudding user response:

You can extract strings that do not contain ] and [ that are not immediately followed with any chars other than square brackets and then a ] char:

(?=([^\]\[] ))\1(?![\]\[]*])

The trick is also to use an atomic first pattern so as to stop backtracking to return a part of a match. In JavaScript regex, the atomic pattern can be defined with a positive lookahead capturing a pattern, and then using a backreference to the matched text right after.

Details:

  • (?=([^\]\[] )) - a positive lookahead that captures into Group 1 one or more chars other than [ and ]
  • \1 - the backreference to Group 1 that consumes the text captured into Group 1
  • (?![\]\[]*]) - a negative lookahead that fails the match if, immediately to the right, there are zero or more chars other than [ and ] and then a ].
  • Related