Home > Mobile >  Find first match in string from the end
Find first match in string from the end

Time:07-07

I have a string

Manager of Medical Threat Devlop at Micro

I want to find any words that go after at, for, of. Here, I want to get the ['Micro'] (that is at the end of string, after the last at word).

Current code

If I apply r'(?:for|at|of)\s (.*)' I will get incorrect ['Medical Threat Devlop at Micro'].

More examples:

  • Manager of Medical Threat Devlop at Canno -> Canno
  • Manager of Medicalof Threat Devlop of Canno -> Canno
  • Manager of Medicalfor Threat Devlop for Canno -> Canno
  • Threat Devlop at Canno Matt -> Canno Matt

CodePudding user response:

Try this re.split would work this.

Your question is not fully clear give some more input and output examples.

import re
s = 'Manager of Medical Threat Devlop at Micro'
s = re.split(r'at |for |of ',s)[-1:]
print(s)
OUTPUT
                 IN                         :  OUTPUT
'Manager of Medical Threat Devlop at Micro' : ['Micro']
'Threat Devlop at Canno Matt'               : ['Canno Matt']

THERE IS ANOTHER METHOD TO DO THIS (USING re.finditer).

import re
string = 'Threat Devlop at Canno Matt'
s = re.finditer(r'(at | for | of )',string,)
last_index = list(s)[-1].end()
print(string[last_index:])

I am not good in re at all.(But I get it)


Yeah there is another to do this.(Using re.findall)


import re
string = 'Threat Devlop at Canno of Matjkasa'
s = re.findall(r'.*(?:at|for|of)\s ', string)

print(string.replace(*s,''))

CodePudding user response:

You can use

re.findall(r'.*\b(?:for|at|of)\s (.*)', text)

See the regex demo. Details:

  • .* - any zero or more chars other than line break chars, as many as possible
  • \b - a word boundary
  • (?:for|at|of) - for, at or of
  • \s - one or more whitespaces
  • (.*) - Group 1: any zero or more chars other than line break chars, as many as possible.

Another regex that will fetch the same results is

re.findall(r'\b(?:for|at|of)\s ((?:(?!\b(?:for|at|of)\b).)*)$', text)

Details:

  • \b - a word boundary
  • (?:for|at|of) - for, at or of
  • \s - one or more whitespaces
  • ((?:(?!\b(?:for|at|of)\b).)*) - Group 1: any char, other than a line break char, zero or more but as many as possible, occurrences, that does not start a for, at or of as a whole word char sequence
  • $ - end of string.

Note you can also use re.search since you expect a single match:

match = re.search(r'.*\b(?:for|at|of)\s (.*)', text)
if match:
    print(match.group(1))

CodePudding user response:

If you want to do it with a regex, then here's the way to do it.

Replace matches of the following regex with the empty string:

.*\b(?:for|at|of)\b\s?

This will match:

  • .*: any character (by its nature, this pattern will match as most characters as possible)
  • \b(?:for|at|of)\b: your hotwords between boundary symbols
  • \s?: an optional space

Check the demo here

  • Related