Home > Mobile >  Java Regex -> how to make WHOLE lookahead lazy
Java Regex -> how to make WHOLE lookahead lazy

Time:12-21


I am tackling a problem which has probably easy solution but I just can't think of it... I've got an XML file input, just for the sake of testing I've put one structure below.
My goal: for input String s; s == value of 'name' attribute of the <con:testSuite> element, (let's choose SUITE2)
I want to match each value of 'name' in <con:testCase> element, but just within testCase elements inside of the chosen testSuite element.

My regex(regex101 testing):

(?<=<con:testSuite[\d\D]{1,60}name=\"SUITE2\")(?:[\d\D]*?<con:testCase[\d\D]*?name=\")(.*?)(?:\")(?=[\d\D]*?</con:testSuite)

In Java: Pattern.compile("(?<=<con:testSuite[\\d\\D]{1,60}name=\\\"SUITE2\\\")(?:[\\d\\D]*?<con:testCase[\\d\\D]*?name=\\\")(.*?)(?:\\\")(?=[\\d\\D]*?</con:testSuite)")

Now this regex returns just value of the name in first testCase.. if I remove first lazy in first non-matching group, then just a last one... however as I read it, without lazy it would make sense to me to match each testCase's name's value.

...Anyway I moved on and since this is not by any means a production code, just my utility tool, and I can guess the max chars between checkpoints in xml, I've chosen to move non capturing group to be a part of lookbehind (and make it fixed length ofc)

(?<=<con:testSuite[\d\D]{1,60}name=\"SUITE2\"[\d\D]{1,60000}<con:testCase[\d\D]{1,50000}name=\")(.*?)(?:\")(?=[\d\D]*?</con:testSuite)

Now this does the magic in terms of finding all values, however it still got one issue, that being a lookahead -> I've got [\d\D]* with lazy, yet it ignores first occurence of the </con:testSuite and matches the last possible, therefore it does not fit my condition about values just within the one chosen con:testSuite element... and my Xmas-mooded mind just cannot fight this :)

Sorry for the long post, any help is appreciated <3

-for SUITE2 chosen, desired matches[]=["55555","66666","77777","88888"]
-testing xml structure:

<con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE1" disabled="true">
  <con:testCase seOnErrors="true" name="44444" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="33333" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="22222" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="11111" searchProp>sdfsddsfsdsd
  </con:testCase>
</con:testSuite>
<con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE2" disabled="true">
  <con:testCase seOnErrors="true" name="55555" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="66666" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="77777" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="88888" searchProp>sdfsddsfsdsd
  </con:testCase>
</con:testSuite>
<con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE3" disabled="true">
  <con:testCase seOnErrors="true" name="99999" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="0000" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="11221122" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="33443344" searchProp>sdfsddsfsdsd
  </con:testCase>
</con:testSuite>

CodePudding user response:

Your regex attempt is certainly formidable but this question is a poster-child for precisely when to not use regex. XPath is the right tool.

See XPath below:

//con:testSuite[@name='SUITE2']/con:testCase/@name

Try it out yourself at https://www.freeformatter.com/xpath-tester.html

Just make sure to namespace the XML data properly:

<?xml version="1.0" encoding="UTF-8"?>
<con:test xmlns:con="http://www.example.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="./example1.xsd">
   <con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE1" disabled="true">
      <con:testCase seOnErrors="true" name="44444" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="33333" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="22222" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="11111" searchProp="">sdfsddsfsdsd</con:testCase>
   </con:testSuite>
   <con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE2" disabled="true">
      <con:testCase seOnErrors="true" name="55555" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="66666" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="77777" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="88888" searchProp="">sdfsddsfsdsd</con:testCase>
   </con:testSuite>
   <con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE3" disabled="true">
      <con:testCase seOnErrors="true" name="99999" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="0000" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="11221122" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="33443344" searchProp="">sdfsddsfsdsd</con:testCase>
   </con:testSuite>
</con:test>
  • Related