import re
name = "John"
#In these examples it works fine
input_sense_aux = "These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer"
#input_sense_aux = "Do you know if John with the others could come this afternoon?"
#In these examples it does not work well
#input_sense_aux = "John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "Can you help us, otherwise it will be waiting for a while longer for John"
#input_sense_aux = "sorry! can you help us? otherwise it will be waiting for a while longer for John"
regex_patron_m1 = r"\s*((?:\w\s*) )\s*?" name r"\s*((?:\w\s*) )\s*\??"
m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m1:
something_1, something_2 = m1.groups()
something_1 = something_1.strip()
something_2 = something_2.strip()
print(repr(something_1))
print(repr(something_2))
I need the regex to grab the content before "John" like this:
(start of sentence|¿|¡|,|;|:|(|[|.) \s* "content for something_1" \s* John
And then:
John \s* "content for something_2" \s* (end of sentence|?|!|,|;|:|)|]|.)
In the fists examples, the regex works fine:
'these teams are too many but I know that'
'can help us'
'Do you know if'
'with the others could come this afternoon'
But with the cases of the last 3 examples the regex does not return anything
And I need help to be able to generalize my regex to all these cases and at the same time respect the conditions in which it must extract the content of something_1
and something_2
For the 3 last examples, the expected results are:
''
' can help us'
' otherwise it will be waiting for a while longer for '
''
' otherwise it will be waiting for a while longer for '
''
CodePudding user response:
You can use
import re
name = "John"
input_sense_auxs = [
"These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer",
"These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer",
"These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer",
"Do you know if John with the others could come this afternoon?",
"John can help us, otherwise it will be waiting for a while longer",
"Can you help us, otherwise it will be waiting for a while longer for John",
"sorry! can you help us? otherwise it will be waiting for a while longer for John"]
regex_patron_m1 = fr'(?:^|[?!¿¡,;:([.])\s*(?:(\w (?:\s \w )*)\s*)?{name}(?:\s*(\w (?:\s \w )*))?\s*(?:$|[]?!,;:).])'
# r"\s*((?:\w\s*) )\s*?" name r"\s*((?:\w\s*) )\s*\??"
for input_sense_aux in input_sense_auxs:
print(f'--- {input_sense_aux} ---')
m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m1:
something_1, something_2 = m1.groups()
something_1 = something_1.strip() if something_1 else ""
something_2 = something_2.strip() if something_2 else ""
print(repr(something_1))
print(repr(something_2))
Output:
--- These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer ---
'I think'
'can help us'
--- These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer ---
'These sound system are too many but I know that'
'can help us'
--- These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer ---
'These sound system are too many but I know that'
'can help us'
--- Do you know if John with the others could come this afternoon? ---
'Do you know if'
'with the others could come this afternoon'
--- John can help us, otherwise it will be waiting for a while longer ---
''
'can help us'
--- Can you help us, otherwise it will be waiting for a while longer for John ---
'otherwise it will be waiting for a while longer for'
''
--- sorry! can you help us? otherwise it will be waiting for a while longer for John ---
'otherwise it will be waiting for a while longer for'
''
See the Python demo.
Details:
(?:^|[?!¿¡,;:([.])\s*(?:(\w (?:\s \w )*)\s*)?
- the prefix, the left-hand side part, that matches(?:^|[?!¿¡,;:([.])
- either start of string or a char from the?!¿¡,;:([.
set\s*
- zero or more whitespaces(?:(\w (?:\s \w )*)\s*)?
- an optional occurrence of(\w (?:\s \w )*)
- Group 1: one or more word chars and then zero or more sequences of one or more whitespaces and one or more word chars\s*
- zero or more whitespaces
John
- the name(?:\s*(\w (?:\s \w )*))?\s*(?:$|[]?!,;:).])
- the right-hand part:\s*
- zero or more whitespaces(\w (?:\s \w )*))?
- Group 2: an optional sequence of one or more word chars and then zero or more occurrences of one or more whitespaces followed with one or more word chars\s*
- zero or more whitespaces(?:$|[]?!,;:).])
- end of string or a char from the]?!,;:).
charset.
See the regex demo.
CodePudding user response:
Take this improved version of the code alongside some explanation, so you can customize it how you want:
import re
name = "John"
#In these examples it works fine
# input_sense_aux = "These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer"
# input_sense_aux = "These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer"
# input_sense_aux = "Do you know if John with the others could come this afternoon?"
#In these examples it does not work well
# input_sense_aux = "John can help us, otherwise it will be waiting for a while longer"
# input_sense_aux = "Can you help us, otherwise it will be waiting for a while longer for John"
# input_sense_aux = "sorry! can you help us? otherwise it will be waiting for a while longer for John"
regex_patron_m1 = r"\s*([\?:\w\s*] )?\s*" name r"\s*([\?:\w\s*] )?\s*"
m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m1:
something_1, something_2 = m1.groups()
if not something_1 is None:
something_1 = something_1.strip()
print(repr(something_1))
if not something_2 is None:
something_2 = something_2.strip()
print(repr(something_2))
In the first two examples that did not work you have put John at the start/end of the string. This means that one of the two something
variables could be None
. I have fixed you code to check for that.
Now to the regex:
This was the original: r"\s*((?:\w\s*) )\s*?" name r"\s*((?:\w\s*) )\s*\??"
I made the following changes:
- Removed
\??
from the end. A questionmark is a quantifier and means "once or none" but you already have*
for spaces which means "zero or mire times" so you have two quantifiers in a row, which is not needed - Changed the inner statements from
()
to[]
. round brackets are for groups, for example to get a certain part of the string, square brackets are for character groups to check "is any of these characters here?". You are currently checking if there are word-characters\w
, spaces\s
, colons:
or questionmarks?
present. To check for more you would have to add characters inside the square brackets, but beware:. \ * ? [ ^ ] $ ( ) { } = ! < > | : - #
need to be escaped with a preceeding backslash\
- Made the character groups optional with
?
. When "John" is the first part of the string, you dont have something to match in front of it. Therefore your regex fails. By making the before- and after-part optional, you can also match those strings
If you have any remaining questions feel free to ask in the comments.