Home > Mobile >  Regex with positive lookbehind that will match multiple lines
Regex with positive lookbehind that will match multiple lines

Time:07-11

I'm struggling with regex that will match multiple lines after some specific string.

Let's say we have a sample data like:

Data1
some changing text
12406943 Old Company New Company reason Something 1/2/2005 10,00
14757152 Old Company 2 New Company 2 Reason 2 Something2 10/7/2007 8,00

Data2
some changing text
12406943 New Company invoice1 31.01.2005 500,00
14757152 New Company 2 invoice2 28.05.2007 1000,00

Earlier I was getting data from Data1 with regex:

(?<caseNumber>\d )  ?(?<temp>.*)  ?(?>Something|Something2).*(?<originalDate>\d{1,2}/\d{1,2}/\d{4})  ?(?<interestRate>\d{1,3}\.\d{2}).*

and Data2:

(?<caseNumber>\d )  ?(?<companyName>.*)  ?(?<invoiceNumber>\S*)  ?\d{2}\.\d{2}\.\d{4}

Unfortunately something changed and the date format for Data1 is the same as for Data2 and regex for Data2 is getting rows from Data1.

Data1
some changing text
12406943 Old Company New Company reason Something 02.01.2005 10,00
14757152 Old Company 2 New Company 2 Reason 2 Something2 07.10.2007 8,00

Data2
some changing text
12406943 New Company invoice1 31.01.2005 500,00
14757152 New Company 2 invoice2 28.05.2007 1000,00

I wanted to use positive lookbehind and check if before all Data2 rows I will find Data2 text, but it only returns the first row from results

(?<=Data2\Rsome changing text\R)(?<caseNumber>\d )  ?(?<companyName>.*)  ?(?<invoiceNumber>\S*)  ?\d{2}\.\d{2}\.\d{4}

The use-case in java code is that I find the row by regex matcher .find() method and then call that method in a while loop to run row by row. In above situation it will only return one row and that's not what I want.

Do you maybe have any idea how to define that regex or enable any multiline searching for data row after some text? Maybe that's some novice mistake, but I can't see it for now :)

If I tried to use quantifier, and treat the main data as group it takes the last occurrence only:

(?<=Data2\Rsome changing text\R)((?<caseNumber>\d )  ?(?<companyName>.*)  ?(?<invoiceNumber>\S*)  ?\d{2}\.\d{2}\.\d{4}.*\R) 

CodePudding user response:

In Java, you can make use of the \G anchor instead of using a lookbehind assertion.

Explanation

(?:^Data2\Rsome changing text|\G(?!^))\R(?<caseNumber>\d )\h (?<companyName>\S.*?)\h (?<invoiceNumber>\S )\h \d{2}\.\d{2}\.\d{4}\b.*
  • (?: Non capture group
    • ^Data2\Rsome changing text From the start of the string, match the text literally followed by newlines
    • | Or
    • \G(?!^) Assert the current position at the end of the previous match, not at the start
  • )\R Close non capture group and match a unicode newline sequence
  • (?<caseNumber>\d )\h Capture 1 digits and 1 spaces
  • (?<companyName>\S.*?)\h Capture a single non whitespace char followed by any chars (as least as possible) and 1 spaces
  • (?<invoiceNumber>\S )\h Capture 1 non whitespace chars and 1 spaces
  • \d{2}\.\d{2}\.\d{4}\b Match a date like pattern in the data
  • .* Match the rest of the line

See a regex demo.

  • Related