Home > Software design >  Regular expression to select words not starting with one of a set of prefixes
Regular expression to select words not starting with one of a set of prefixes

Time:12-12

I am trying to figure out a regular expression that selects all words that do NOT begin with one of a set of prefixes.

For example, with allowable word prefixes jan|feb|mar|apr I'd want to match the text in bold in the following string:

"in january or feb I marched off to see april"

I managed to select the exact opposite of what I'd like, matching words beginning with the prefixes:

(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(?:\w )?

I also managed to select all words that were not the prefixes themselves, but this doesn't handle all words beginning with the prefixes, just words that are the prefix:

[a-z] \b(?<!\bjan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)

The ultimate goal is to strip all words that do not begin with one of the prefixes from the input string.

CodePudding user response:

The ultimate goal is to strip all words that do not begin with one of the prefixes from the input string.

You may use this regex for matching:

\b(?!(?:jan|feb|mar|apr|may|ju[nl]|aug|sep|oct|nov|dec))\w \s*

and replace it with an empty string.

RegEx Demo

RegEx Details:

  • \b: Word boundary
  • (?!: Start negative lookahead
    • (?:jan|feb|mar|apr|may|ju[nl]|aug|sep|oct|nov|dec): Match of the 3 letter month prefix
  • ): End negative lookahead
  • \w : Match 1 word characters
  • \s*: Match 0 or more whitespaces
  • Related