I am trying to figure out a regular expression that selects all words that do NOT begin with one of a set of prefixes.
For example, with allowable word prefixes jan|feb|mar|apr
I'd want to match the text in bold in the following string:
"in january or feb I marched off to see april"
I managed to select the exact opposite of what I'd like, matching words beginning with the prefixes:
(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(?:\w )?
I also managed to select all words that were not the prefixes themselves, but this doesn't handle all words beginning with the prefixes, just words that are the prefix:
[a-z] \b(?<!\bjan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)
The ultimate goal is to strip all words that do not begin with one of the prefixes from the input string.
CodePudding user response:
The ultimate goal is to strip all words that do not begin with one of the prefixes from the input string.
You may use this regex for matching:
\b(?!(?:jan|feb|mar|apr|may|ju[nl]|aug|sep|oct|nov|dec))\w \s*
and replace it with an empty string.
RegEx Details:
\b
: Word boundary(?!
: Start negative lookahead(?:jan|feb|mar|apr|may|ju[nl]|aug|sep|oct|nov|dec)
: Match of the 3 letter month prefix
)
: End negative lookahead\w
: Match 1 word characters\s*
: Match 0 or more whitespaces