Given:
Multiline text:
SECTION 1 TEXT 201
THIS NEEDS TO BE CAPTURED #1 500
SECTION 2 TEXT 202
THIS NEEDS TO BE CAPTURED #2 502
NoSpaceShouldBeCaptured 123
THIS SHOULD BE IGNORED 000
Required:
Capture all strings from beginning of line that may contain spaces, that are followed by multiple whitespace, that's followed by three digits at the end of the line. Do NOT capture anything that starts with word "SECTION"
Desired Result:
"THIS NEEDS TO BE CAPTURED #1", "500"
"THIS NEEDS TO BE CAPTURED #2", "502"
"NoSpaceShouldBeCaptured", 123
My attempts:
^(. ?)(?=\s{2,})\s{2,}(\d{3})$
- this works fine, but includes "SECTION 2 TEXT", which should be ignored.
^(?!SECTION)(. ?)(?=\s{2,})\s{2,}(\d{3})$
- this still captures "SECTION 2 TEXT", which should be ignored.
How do I ignore strings that start with word SECTION?
EDIT
Additional requirement:
Do not capture "THIS SHOULD BE IGNORED" because "THIS SHOULD" is followed by multiple whitespace, but then instead of three digits is followed by "BE IGNORED"
CodePudding user response:
You can use
^(?!SECTION)((?:(?!\s{2}).) ?)\s{2,}(\d{3})$
See the regex demo.
Details:
^
- start of string(?!SECTION)
- there can be noSECTION
substring at the current location((?:(?!\s{2}).) ?)
- Group 1: any one char other than line break chars, one or more occurrences but as few as possible, that does not start a double whitespace char sequence\s{2,}
- two or more whitespaces(\d{3})
- Group 2: three digits$
- end of string.