Find first match in string from the end-CodePudding

I have a string

Manager of Medical Threat Devlop at Micro

I want to find any words that go after at, for, of. Here, I want to get the ['Micro'] (that is at the end of string, after the last at word).

Current code

If I apply r'(?:for|at|of)\s (.*)' I will get incorrect ['Medical Threat Devlop at Micro'].

More examples:

Manager of Medical Threat Devlop at Canno -> Canno
Manager of Medicalof Threat Devlop of Canno -> Canno
Manager of Medicalfor Threat Devlop for Canno -> Canno
Threat Devlop at Canno Matt -> Canno Matt

CodePudding user response：

Try this re.split would work this.

Your question is not fully clear give some more input and output examples.

import re
s = 'Manager of Medical Threat Devlop at Micro'
s = re.split(r'at |for |of ',s)[-1:]
print(s)

OUTPUT

                 IN                         :  OUTPUT
'Manager of Medical Threat Devlop at Micro' : ['Micro']
'Threat Devlop at Canno Matt'               : ['Canno Matt']

THERE IS ANOTHER METHOD TO DO THIS (USING re.finditer).

import re
string = 'Threat Devlop at Canno Matt'
s = re.finditer(r'(at | for | of )',string,)
last_index = list(s)[-1].end()
print(string[last_index:])

I am not good in re at all.(But I get it)

Yeah there is another to do this.(Using re.findall)


import re
string = 'Threat Devlop at Canno of Matjkasa'
s = re.findall(r'.*(?:at|for|of)\s ', string)

print(string.replace(*s,''))

CodePudding user response：

You can use

re.findall(r'.*\b(?:for|at|of)\s (.*)', text)

See the regex demo. Details:

.* - any zero or more chars other than line break chars, as many as possible
\b - a word boundary
(?:for|at|of) - for, at or of
\s - one or more whitespaces
(.*) - Group 1: any zero or more chars other than line break chars, as many as possible.

Another regex that will fetch the same results is

re.findall(r'\b(?:for|at|of)\s ((?:(?!\b(?:for|at|of)\b).)*)$', text)

Details:

\b - a word boundary
(?:for|at|of) - for, at or of
\s - one or more whitespaces
((?:(?!\b(?:for|at|of)\b).)*) - Group 1: any char, other than a line break char, zero or more but as many as possible, occurrences, that does not start a for, at or of as a whole word char sequence
$ - end of string.

Note you can also use re.search since you expect a single match:

match = re.search(r'.*\b(?:for|at|of)\s (.*)', text)
if match:
    print(match.group(1))

CodePudding user response：

If you want to do it with a regex, then here's the way to do it.

Replace matches of the following regex with the empty string:

.*\b(?:for|at|of)\b\s?

This will match:

.*: any character (by its nature, this pattern will match as most characters as possible)
\b(?:for|at|of)\b: your hotwords between boundary symbols
\s?: an optional space

Check the demo here