Regex: Ignore punctuation when selecting part of a string-CodePudding

I'm looking for a way to select a part of a string with punctuation based on a string that doesn't have punctuation.

Ex.

Oh, my goodness. This is it. Oh.

I want to select Oh, my goodness. (note the trailing period). The string that I have to search with is:

oh my goodness

I've been looking all around for a solution to this, but I can't seem to find a good answer. Can anyone help me?

CodePudding user response：

Your question lacks some details, so here are some assumptions:

your space separated search term is a sequence of words to find, e.g. search term foo bar will not find some bar foo text input
your search term should ignore non-word chars, for example foo bar will find some foo, bar text and some foo: bar text
you want to find the search term anywhere in the input
include a trailing dot, if any (e.g. not required)

The regex can be tweaked as needed if some of the assumptions are not correct.

Code with match and replace examples:

const input = 'Oh, my goodness. This is it. Oh.';
const searchTerm = 'oh my goodness';

const regex = new RegExp('\\b'   searchTerm.replace(/  /g, '\\W ')   '\.?', 'i');
console.log({
  match: input.match(regex),
  replace: input.replace(regex, '<b>$&</b>')
});

Output:

{
  "match": [
    "Oh, my goodness."
  ],
  "replace": "<b>Oh, my goodness.</b> This is it. Oh."
}

Explanation of regex construct:

'\\b' -- word boundary (replace with '^' if you want to search at the beginning of the input string)
searchTerm.replace(/ /g, '\\W ') -- allow any non-word chars, such as ,, :
'\.?' -- include optional dot
'i' -- regex flag to ignore case

CodePudding user response：

You can replace all spaces to accept characters between words

const text = 'Oh, my goodness. This is it. Oh.';
const search = 'oh my goodness';

const expression = new RegExp(`${search.replace(/ /g, '.*')}[^.]*\\.*`, 'i');

const [match] = expression.exec(text);

console.log(match)

CodePudding user response：

/[^.]*\b(oh|my)\b.(?=goodness)[^.]*\./Ug

[^.]* and [^.]* check the start and the end of a sentence
\b(oh|my)\b. matches words oh and my in a sentence
(?=goodness) is a positive lookahead. We tell the regex: 'Search oh and my words before the word goodness'
also, we use g (global) and U (Ungreedy) regex flags.

In short, the regex will match all the sentences containing mentioned words and will separate the given line into matching sentences.

regex101.com