Home > Net >  Regex: Ignore punctuation when selecting part of a string
Regex: Ignore punctuation when selecting part of a string

Time:12-28

I'm looking for a way to select a part of a string with punctuation based on a string that doesn't have punctuation.

Ex.

Oh, my goodness. This is it. Oh.

I want to select Oh, my goodness. (note the trailing period). The string that I have to search with is:

oh my goodness

I've been looking all around for a solution to this, but I can't seem to find a good answer. Can anyone help me?

CodePudding user response:

Your question lacks some details, so here are some assumptions:

  • your space separated search term is a sequence of words to find, e.g. search term foo bar will not find some bar foo text input
  • your search term should ignore non-word chars, for example foo bar will find some foo, bar text and some foo: bar text
  • you want to find the search term anywhere in the input
  • include a trailing dot, if any (e.g. not required)

The regex can be tweaked as needed if some of the assumptions are not correct.

Code with match and replace examples:

const input = 'Oh, my goodness. This is it. Oh.';
const searchTerm = 'oh my goodness';

const regex = new RegExp('\\b'   searchTerm.replace(/  /g, '\\W ')   '\.?', 'i');
console.log({
  match: input.match(regex),
  replace: input.replace(regex, '<b>$&</b>')
});

Output:

{
  "match": [
    "Oh, my goodness."
  ],
  "replace": "<b>Oh, my goodness.</b> This is it. Oh."
}

Explanation of regex construct:

  • '\\b' -- word boundary (replace with '^' if you want to search at the beginning of the input string)
  • searchTerm.replace(/ /g, '\\W ') -- allow any non-word chars, such as ,, :
  • '\.?' -- include optional dot
  • 'i' -- regex flag to ignore case

CodePudding user response:

You can replace all spaces to accept characters between words

const text = 'Oh, my goodness. This is it. Oh.';
const search = 'oh my goodness';

const expression = new RegExp(`${search.replace(/ /g, '.*')}[^.]*\\.*`, 'i');

const [match] = expression.exec(text);

console.log(match)

CodePudding user response:

/[^.]*\b(oh|my)\b.(?=goodness)[^.]*\./Ug

  • [^.]* and [^.]* check the start and the end of a sentence
  • \b(oh|my)\b. matches words oh and my in a sentence
  • (?=goodness) is a positive lookahead. We tell the regex: 'Search oh and my words before the word goodness'
  • also, we use g (global) and U (Ungreedy) regex flags.

In short, the regex will match all the sentences containing mentioned words and will separate the given line into matching sentences.

regex101.com

  • Related