I have a text like this:
text = 'hello, how are you?'
I want to extract hello, how
from the text,
re.search('hello how', text)
>>> None
If you are thinking why I am not giving the comma because I am getting the text which I want to extract from some other text as the input regex and this input regex does not have punctuations while the text has. So, I want regex to bypass the punctuations for example to bypass ,
after the hello
.
_________________________ \ ______________________________________
| Input Regex | ------------\ | Text from which I have to extract |
| (Does not have puncs) | ------------/ | (have punctuations) |
| For ex. (hello how) | / | For ex. (hello, how are you?) |
_________________________ ______________________________________
The output of the search should look like
>>> 'hello, how' (the output should have punctuations)
I cannot simply remove all of the punctuations from the text like 'hello, how are you?' as it may contain some essential punctuations which I cannot delete. I want regex only to bypass the ,
after the hello
.
The input regex and the text can be anything, one more example:
input_regex = 'Google LLC'
text = 'Google, LLC. is an American multinational technology company.'
# so the output should be
>>> 'Google, LLC.' # with punctuations
So is there any way to bypass these punctuations without deleting all the punctuations from entire text
. Thanks!
CodePudding user response:
If keeping any and all punctuation and spacing (so everything which is not a number or letter) is fine, then you can just use [^\w]*
between/after the words you search for.
match = re.search(r"Google[^\w]*LLC[^\w]*", text)
CodePudding user response:
I split words and find input start with the same text.
function deepSearch(input, text) {
const [chunkInput, chunkText] = [input.split(' '), text.split(' ')];
for (let i = 0; i < chunkInput.length; i) {
if (!chunkText[i].startsWith(chunkInput[i])) {
return false;
}
}
return true;
}
CodePudding user response:
You could automatically modify the search pattern by allowing a comma before each space, i.e. when searching for Google LLC
you seem to actually want to search for Google,? LLC
.
The question mark in RegEx means "zero or one occurrence".
A simple solution could be:
def searchWithOptionalCommas(needle, haystack):
needle = needle.replace(' ', ',? ')
return re.search(needle, haystack)