Regex bypass some punctuations from text-CodePudding

I have a text like this:

text = 'hello, how are you?'

I want to extract hello, how from the text,

re.search('hello how', text)
>>> None

If you are thinking why I am not giving the comma because I am getting the text which I want to extract from some other text as the input regex and this input regex does not have punctuations while the text has. So, I want regex to bypass the punctuations for example to bypass , after the hello.

_________________________                 \     ______________________________________
| Input Regex           |      ------------\    | Text from which I have to extract  |
| (Does not have puncs) |      ------------/    | (have punctuations)                |
| For ex. (hello how)   |                 /     | For ex. (hello, how are you?)      |
_________________________                       ______________________________________

The output of the search should look like
>>> 'hello, how' (the output should have punctuations)

I cannot simply remove all of the punctuations from the text like 'hello, how are you?' as it may contain some essential punctuations which I cannot delete. I want regex only to bypass the , after the hello.

The input regex and the text can be anything, one more example:

input_regex = 'Google LLC'
text = 'Google, LLC. is an American multinational technology company.'
# so the output should be
>>> 'Google, LLC.' # with punctuations

So is there any way to bypass these punctuations without deleting all the punctuations from entire text. Thanks!

CodePudding user response：

If keeping any and all punctuation and spacing (so everything which is not a number or letter) is fine, then you can just use [^\w]* between/after the words you search for.

match = re.search(r"Google[^\w]*LLC[^\w]*", text)

CodePudding user response：

I split words and find input start with the same text.

function deepSearch(input, text) {
  const [chunkInput, chunkText] = [input.split(' '), text.split(' ')];
  
  for (let i = 0; i < chunkInput.length;   i) {
    if (!chunkText[i].startsWith(chunkInput[i])) {
      return false;
    }
  }
  return true;
}

CodePudding user response：

You could automatically modify the search pattern by allowing a comma before each space, i.e. when searching for Google LLC you seem to actually want to search for Google,? LLC.

The question mark in RegEx means "zero or one occurrence".

A simple solution could be:

def searchWithOptionalCommas(needle, haystack):
    needle = needle.replace(' ', ',? ')
    return re.search(needle, haystack)