Home > Software design >  RegExp Match all text parts except given words
RegExp Match all text parts except given words

Time:04-09

I have a text and I need to match all text parts except given words with regexp

For example if text is 'Something went wrong and I could not do anything' and given words are 'and' and 'or' then the result must be ['Something went wrong', 'I could', 'do anything']

Please don't advise me to use string.split() or string.replace() and etc. I know a several ways how I can do this with build-in methods. I'm wonder if there a regex which can do this, when I will execute text.math(/regexp/g)

Please note that the regular expression must work at least in Chrome, Firefox and Safari versions not lower than the current one by 3! At the moment of asking this question the actual versions are 100.0, 98.0.2 and 15.3 respectively. For example you can not use lookbehind feature in Safari

Please, before answering my question, go to https://regexr.com/ and check your answer!. Your regular expression should highlight all parts of a sentence, including spaces, except for the given words

Before asking this question I tried to do my own search but this links didn't help me. I also tried non accepted answers:

Match everything except for specified strings

Regex: match everything but a specific pattern

Regex to match all words except a given list

Regex to match all words except a given list (2)

Need to find a regular expression for any word except word1 or word2

Matching all words except one

Javascript match eveything except given words

CodePudding user response:

This is beyond the capabilities of regular expressions.

Regular expressions generally are restricted to patterns that can be produced by regular grammars (which is why they are called regular).

Some regular expression tools support features that go beyond this restriction, for example (negative) lookaheads or look-behinds, but these will not give you partial matches.

For the same reason, you cannot match opening and closing HTML tags using regular expressions.

CodePudding user response:

You can do this with the boolean | operator and capture groups.

/^(.*)( and | or )(((.*)( not )(.*))|(.*))$/i

Breaking that down:

  • Any characters from the start of the string up to " and " or " or " : ^(.*)( and | or )

  • All remaining characters: (((.*)( not )(.*))|(.*))$

    • Either two groups separated by " not " : ((.*)( not )(.*))
    • Or, the remaining characters: (.*)

When using String.match the output array will get populated based on what groups were found.

  • matches[0] will be the whole string

  • matches[1] will be the intro text "Something went wrong"

  • matches[2] will be either " and " or " or "

  • matches[6] will be either undefined or " not "

  • if matches[6] == " not ", matches[5] will be the text before, matches[7] will be the text after

  • if matches[6] == undefined, matches[3] will be the remainder of the string

function test(input) {
  let matches = input.match(/^(.*)( and | or )(((.*)( not )(.*))|(.*))$/i);
  console.log(matches); return matches;
}

test('Something went wrong and I could not do anything');
test('Something went wrong and I could recover');
test('Something went wrong or whatever');

  • Related