Home > other >  regex inside ruby
regex inside ruby

Time:06-24

i have a rather simple regex expression (irony off) and ruby is treating it differently as expected

string = puts worksheet.sheet_data[5][10].value

string.split(/(?>(?>\([^()]*(?R)?[^()]*\))|(?>\[[^[\]]*(?R)?[^[\]]*\])|(?>{[^{}]*(?R)?[^{}]*})|(?>"[^"]*")|(?>[^(){}[\]", ] ))(?>[ ]*(?R))*/)

I already took out the (?R) and replaced it with \g<1> but after running it I still get the following error: premature end of char-class:

I got told that i need to escape some closing brackets because [^()] in ruby gets treated as if ] is still part of the set so i have to change it to [^()\]. I did all of that and my regex looks like this now:

string.split(/(?>(?>\([^()\]*\g<1>?[^()\]*\))|(?>\[[^[]\]*\g<1>?[^[]\]*])|(?>{[^{}\]*\g<1>?[^{}\]*})|(?>"[^"\]*")|(?>[^(){}[]", \] ))(?>[ \]*\g<1>)*/) 

Its basically the same just that I removed previous \] escaping characters because ruby treats them as escaped anyway and added \ to closing brackets where there was none. Ruby still throws the same exception. I tried the regex previously on regexr.com so it must work.

EDIT:

the sample text is attribute1, attribute2 (further specification,(even further specification, etc), another specification), attribute3, attribute4

I should get attribute1, attribute2(further specification,(even further specification, etc), another specification), attribute3, attribute4

The commas inside parantheses should be ignored

CodePudding user response:

Instead of \g<1>, you need \g<0> since \g<1> recurses Capturing group #1 pattern, and (?R) recurses the whole regex pattern (and the whole pattern is Group 0).

Make sure you escape [ and ] inside character classes, they are special there in the Onigmo regex library.

You need

(?>(?>\([^()]*\g<0>?[^()]*\))|(?>\[[^\[\]]*\g<0>?[^\[\]]*\])|(?>{[^{}]*\g<0>?[^{}]*})|"[^"]*"|[^(){}\[\]", ] )(?>[ ]*\g<0>)*

See the Rubular demo.

  • Related