Home > Net >  Regex finding reapeating group issue when comma present and 'single line' mode
Regex finding reapeating group issue when comma present and 'single line' mode

Time:12-07

I could nto debug this in regex101 to find why it does not work.

https://regex101.com/r/thl3ui/1

/^\s*@MyCustomQuery\(\s*([^\s] )\s*=\s*(. )\s*\,\s*([^\s] )\s*=\s*(. )\s*\,\s*([^\s] )\s*=\s*(. )\s*\)/s

And this is the string I am trying to parse:

@MyCustomQuery(name = "nativeSQL", query = "SELECT emp1.emp_id, emp1.name, emp1.manager_id, "
  "emp1.dept_id, emp1.address_id "   "FROM EMP emp1, EMP emp2 "
  "WHERE ((emp2.EMP_ID = ?) AND (emp2.EMP_ID = emp1.MANAGER_ID))", resultClass = Professor.class)
public class SomeClass {
}

And the result would be: group1 = name group2 = "nativeSQL" group3 = query group4 = "SELECT emp1.emp_id, emp1.name, emp1.manager_id, " "emp1.dept_id, emp1.address_id " "FROM EMP emp1, EMP emp2 " "WHERE ((emp2.EMP_ID = ?) AND (emp2.EMP_ID = emp1.MANAGER_ID))" group5 = ... for ever until ) is found.

The regex works, but I need to have a specific ammount of groups, if I try to repeat it I get errors:

^\s*@MyCustomQuery\((\s*([^\s] )\s*=\s*(. )\s*)?\,\)

Is it possible to repeat a capture group that contains 2 groups inside: \s*([^\s] )\s*=\s*(. )\s* at every ',' and end the repetition in ')'?

Any help creating a repeating group to be able to capture the pairs of key and values in the Java class Annotation is very appreciated.

CodePudding user response:

You already have a few capture groups in the current pattern. If you want to extend a variable number of capture groups, you can not use JavaScript to repeat the capture group and then get the groups by number, like group 1, group 2 because repeating a capture group will hold the value of the last iteration.

(You can do that for example in .NET or the Python PyPi regex module to get the captures collection)

Another option is to use a long list of optional capture groups, but then you would always have to account for the maximum number.

If you use JavaScript that supports a lookbehind assertion, you can get the key values pairs in group 1 and 2 by asserting the opening part from the start of the string to the left.

Note that there is a query in the example string, that also contains double quotes and comma's so this can be error prone as for the separate defined parts you have to define boundaries that you can rely on.

For the example data, you might use:

(?<=^\s*@MyCustomQuery\([^]*)([^\s=,()] )\s*=\s*(?:"([^]*?)"|([^\s,=] ?))(?:,|\)$)

The pattern matches:

  • (?<= Positive lookbehind, assert what is to the left is
    • ^\s*@MyCustomQuery\( Match @MyCustomQuery( at the start of the string
    • [^]* Optionally repeat matching any character including newlines
  • ) Close lookbehind
  • ([^\s=,()] ) Capture group 1, match 1 occurrences of any char except the listed in the negated character class
  • \s*=\s* Match an equals sign between optional whitespace chars
  • (?: Non capture group for the alternation, match either
    • "([^]*?)" Capture optional chars between double quotes in group 2
    • | Or
    • ([^\s,=] ?) Capture 1 times any character other than the listed in the character class, non greedy in group 3
  • ) Close non capture group
  • (?:,|\)$) Match either a , or ) followed by the end of the string

Regex demo

There are 2 different capture groups for the value. To get the group value, you can check if one of the groups is not empty.

  • Related