i have a rather simple regex expression (irony off) and ruby is treating it differently as expected
string = puts worksheet.sheet_data[5][10].value
string.split(/(?>(?>\([^()]*(?R)?[^()]*\))|(?>\[[^[\]]*(?R)?[^[\]]*\])|(?>{[^{}]*(?R)?[^{}]*})|(?>"[^"]*")|(?>[^(){}[\]", ] ))(?>[ ]*(?R))*/)
I already took out the (?R) and replaced it with \g<1> but after running it I still get the following error: premature end of char-class:
I got told that i need to escape some closing brackets because [^()] in ruby gets treated as if ] is still part of the set so i have to change it to [^()\]. I did all of that and my regex looks like this now:
string.split(/(?>(?>\([^()\]*\g<1>?[^()\]*\))|(?>\[[^[]\]*\g<1>?[^[]\]*])|(?>{[^{}\]*\g<1>?[^{}\]*})|(?>"[^"\]*")|(?>[^(){}[]", \] ))(?>[ \]*\g<1>)*/)
Its basically the same just that I removed previous \] escaping characters because ruby treats them as escaped anyway and added \ to closing brackets where there was none. Ruby still throws the same exception. I tried the regex previously on regexr.com so it must work.
EDIT:
the sample text is attribute1, attribute2 (further specification,(even further specification, etc), another specification), attribute3, attribute4
I should get attribute1, attribute2(further specification,(even further specification, etc), another specification), attribute3, attribute4
The commas inside parantheses should be ignored
CodePudding user response:
Instead of \g<1>
, you need \g<0>
since \g<1>
recurses Capturing group #1 pattern, and (?R)
recurses the whole regex pattern (and the whole pattern is Group 0).
Make sure you escape [
and ]
inside character classes, they are special there in the Onigmo regex library.
You need
(?>(?>\([^()]*\g<0>?[^()]*\))|(?>\[[^\[\]]*\g<0>?[^\[\]]*\])|(?>{[^{}]*\g<0>?[^{}]*})|"[^"]*"|[^(){}\[\]", ] )(?>[ ]*\g<0>)*
See the Rubular demo.