regex lookahead n times python-CodePudding

I have the following sentence as an example

isaac morka morka morka

I am trying to get the following result:

isaac morka

i tried the following code:

re.findall(r'isaac[\s\w] (?=morka)', 'isaac morka morka morka')

but the result obtained is not correct

['isaac morka morka']

CodePudding user response：

You can simplify your regex isaac[\s\S] ?morka without using lookaround.

Test the regex here: https://regex101.com/r/bmp5pH/1

CodePudding user response：

You can use

rgx = r'\bisaac\b|\bmorka\b(?!.*\bmorka\b)'

str = 'isaac morka morka morka'

re.findall(rgx, str)
  #=> ["isaac", "morka"]

Python demo^_<-_\(ツ)/^_->Regex demo

Let's break down the regular expression.

\bisaac\b   # match 'isaac' with word boundaries fore and aft
|           # or
\bmorka\b   # match 'morca'  with word boundaries fore and aft
(?!         # begin negative lookahead
  .*        # match zero or more characters
  \bmorka\b # match 'morca' with word boundaries fore and aft
)           # end negative lookahead

If a list of unique words is to be returned such that each word in the list appears at least once in the string one could write the following.

str = "isaac morka louie morka isaac morka"

rgx = r'\b(\w )\b(?!.*\b\1\b)'

re.findall(rgx, str)
  #=> ['louie', 'isaac', 'morka']

Demo

\b(\w )\b   # match one or more word characters with word boundaries
            # fore and aft and save to capture group 1
(?!         # begin negative lookahead
  .*        # match zero or more characters
  \b\1\b    # match the content of capture group 1 with word boundaries
            # fore and aft
)           # end negative lookahead

CodePudding user response：

Regex is same As what @anotherGatsby used. Below code snippet gives your required result.

x=['isaac morka morka']

str = str(x)

rex =re.compile('isaac[\s\S] ?morka')

print(re.findall(rex,str))