I have the following sentence as an example
isaac morka morka morka
I am trying to get the following result:
isaac morka
i tried the following code:
re.findall(r'isaac[\s\w] (?=morka)', 'isaac morka morka morka')
but the result obtained is not correct
['isaac morka morka']
CodePudding user response:
You can simplify your regex isaac[\s\S] ?morka
without using lookaround.
Test the regex here: https://regex101.com/r/bmp5pH/1
CodePudding user response:
You can use
rgx = r'\bisaac\b|\bmorka\b(?!.*\bmorka\b)'
str = 'isaac morka morka morka'
re.findall(rgx, str)
#=> ["isaac", "morka"]
Python demo<-\(ツ)/->Regex demo
Let's break down the regular expression.
\bisaac\b # match 'isaac' with word boundaries fore and aft
| # or
\bmorka\b # match 'morca' with word boundaries fore and aft
(?! # begin negative lookahead
.* # match zero or more characters
\bmorka\b # match 'morca' with word boundaries fore and aft
) # end negative lookahead
If a list of unique words is to be returned such that each word in the list appears at least once in the string one could write the following.
str = "isaac morka louie morka isaac morka"
rgx = r'\b(\w )\b(?!.*\b\1\b)'
re.findall(rgx, str)
#=> ['louie', 'isaac', 'morka']
\b(\w )\b # match one or more word characters with word boundaries
# fore and aft and save to capture group 1
(?! # begin negative lookahead
.* # match zero or more characters
\b\1\b # match the content of capture group 1 with word boundaries
# fore and aft
) # end negative lookahead
CodePudding user response:
Regex is same As what @anotherGatsby used. Below code snippet gives your required result.
x=['isaac morka morka']
str = str(x)
rex =re.compile('isaac[\s\S] ?morka')
print(re.findall(rex,str))