Home > OS >  Repeated group captures more matches than specified
Repeated group captures more matches than specified

Time:11-22

I am trying to match the following sequence, line by line:

  • start of line
  • maybe some space
  • the Kd string
  • at least one space, maybe more
  • either 3 or 4 float numbers whose format may be messy
  • maybe some space
  • end of line

The problem is that the 4th sample is also captured even though it has 5 numbers in it.

Pattern:

^\s*Kd\s .*(?:[- ]?0*\d*\.?\d*){3,4}$

Samples:

Kd   1.0  0.1   0.0
   Kd   .0  4.   01.
  Kd   .0  4.   01.  01.
 Kd   .0  4.   01. 01. 01.
    Kd   1.0  0.1   0.0  0.0
  Kd   1.0  0.1   0.0

Expected captures:

  • 1.0, 0.1, 0.0
  • .0, 4., 01.
  • .0, 4., 01., 01.
  • failure
  • 1.0, 0.1, 0.0, 0.0
  • 1.0, 0.1, 0.0

Question:

What am I doing wrong in the regex so it also matches lines with more than 4 floats in them?

CodePudding user response:

The main problem is the .* part that matches any zero or more chars other than an LF char, as many times as possible. You also need to put \s into the repeated group so as to allow whitespaces in between the numeric values.

You can use

^\s*Kd(?:\s ([- ]?(?:\d*\.?\d |\d \.\d*))){3,4}$

See the .NET regex demo. Details:

  • ^ - start of string
  • \s* - zero or more whitespaces
  • Kd - a fixed string
  • (?:\s ([- ]?(?:\d*\.?\d |\d \.\d*))){3,4} - three to four occurrences of
    • \s - one or more whitespaces
    • ([- ]?(?:\d*\.?\d |\d \.\d*)) - Group 1:
      • [- ]? - an optional - or
      • (?:\d*\.?\d |\d \.\d*) - either zero or more digits, an optional . and one or more digits, or one or more digits, . and zero or more digits
  • $ - end of string.
  • Related