Home > Enterprise >  Regex to find string that may contain single space that is followed by multiple whitespace
Regex to find string that may contain single space that is followed by multiple whitespace

Time:10-06

Given:

Multiline text:

SECTION 1 TEXT                       201
THIS NEEDS TO BE CAPTURED #1         500
SECTION 2 TEXT                       202
THIS NEEDS TO BE CAPTURED #2         502
NoSpaceShouldBeCaptured              123
THIS SHOULD    BE IGNORED            000

Required:

Capture all strings from beginning of line that may contain spaces, that are followed by multiple whitespace, that's followed by three digits at the end of the line. Do NOT capture anything that starts with word "SECTION"

Desired Result:

"THIS NEEDS TO BE CAPTURED #1", "500"
"THIS NEEDS TO BE CAPTURED #2", "502"
"NoSpaceShouldBeCaptured", 123

My attempts:

^(. ?)(?=\s{2,})\s{2,}(\d{3})$ - this works fine, but includes "SECTION 2 TEXT", which should be ignored.

^(?!SECTION)(. ?)(?=\s{2,})\s{2,}(\d{3})$ - this still captures "SECTION 2 TEXT", which should be ignored.

How do I ignore strings that start with word SECTION?

EDIT

Additional requirement:

Do not capture "THIS SHOULD BE IGNORED" because "THIS SHOULD" is followed by multiple whitespace, but then instead of three digits is followed by "BE IGNORED"

CodePudding user response:

You can use

^(?!SECTION)((?:(?!\s{2}).) ?)\s{2,}(\d{3})$

See the regex demo.

Details:

  • ^ - start of string
  • (?!SECTION) - there can be no SECTION substring at the current location
  • ((?:(?!\s{2}).) ?) - Group 1: any one char other than line break chars, one or more occurrences but as few as possible, that does not start a double whitespace char sequence
  • \s{2,} - two or more whitespaces
  • (\d{3}) - Group 2: three digits
  • $ - end of string.
  • Related