I have a string
Manager of Medical Threat Devlop at Micro
I want to find any words that go after at
, for
, of
. Here, I want to get the ['Micro']
(that is at the end of string, after the last at
word).
Current code
If I apply r'(?:for|at|of)\s (.*)'
I will get incorrect ['Medical Threat Devlop at Micro']
.
More examples:
Manager of Medical Threat Devlop at Canno
->Canno
Manager of Medicalof Threat Devlop of Canno
->Canno
Manager of Medicalfor Threat Devlop for Canno
->Canno
Threat Devlop at Canno Matt
->Canno Matt
CodePudding user response:
Try this re.split
would work this.
Your question is not fully clear give some more input and output examples.
import re
s = 'Manager of Medical Threat Devlop at Micro'
s = re.split(r'at |for |of ',s)[-1:]
print(s)
OUTPUT
IN : OUTPUT
'Manager of Medical Threat Devlop at Micro' : ['Micro']
'Threat Devlop at Canno Matt' : ['Canno Matt']
THERE IS ANOTHER METHOD TO DO THIS (USING re.finditer
).
import re
string = 'Threat Devlop at Canno Matt'
s = re.finditer(r'(at | for | of )',string,)
last_index = list(s)[-1].end()
print(string[last_index:])
I am not good in re
at all.(But I get it)
Yeah there is another to do this.(Using re.findall
)
import re
string = 'Threat Devlop at Canno of Matjkasa'
s = re.findall(r'.*(?:at|for|of)\s ', string)
print(string.replace(*s,''))
CodePudding user response:
You can use
re.findall(r'.*\b(?:for|at|of)\s (.*)', text)
See the regex demo. Details:
.*
- any zero or more chars other than line break chars, as many as possible\b
- a word boundary(?:for|at|of)
-for
,at
orof
\s
- one or more whitespaces(.*)
- Group 1: any zero or more chars other than line break chars, as many as possible.
Another regex that will fetch the same results is
re.findall(r'\b(?:for|at|of)\s ((?:(?!\b(?:for|at|of)\b).)*)$', text)
Details:
\b
- a word boundary(?:for|at|of)
-for
,at
orof
\s
- one or more whitespaces((?:(?!\b(?:for|at|of)\b).)*)
- Group 1: any char, other than a line break char, zero or more but as many as possible, occurrences, that does not start afor
,at
orof
as a whole word char sequence$
- end of string.
Note you can also use re.search
since you expect a single match:
match = re.search(r'.*\b(?:for|at|of)\s (.*)', text)
if match:
print(match.group(1))
CodePudding user response:
If you want to do it with a regex, then here's the way to do it.
Replace matches of the following regex with the empty string:
.*\b(?:for|at|of)\b\s?
This will match:
.*
: any character (by its nature, this pattern will match as most characters as possible)\b(?:for|at|of)\b
: your hotwords between boundary symbols\s?
: an optional space
Check the demo here