Find repeating substring within two substrings with regex-CodePudding

I have the following kind of string "sdfds asd bsd bsd bsd cdf sadasd". The number of "bds" can vary. I want to extract "asd bsd bsd bsd cdf".

I used the following expression "(asd)\s(bsd)*\s(cdf)"

CodePudding user response：

To get a match only, the pattern could be without capturing groups (as repeating a capture group keeps only the value of the last iteration), and match the leading whitespace char in the repetition.

The word boundaries prevent a partial word match.

\basd(?:\sbsd)*\scdf\b

Example

import re

s = "sdfds asd bsd bsd bsd cdf sadasd"
pattern = r"\basd(?:\sbsd)*\scdf\b"

m = re.search(pattern, s)
if m:
    print(m.group())

Output

asd bsd bsd bsd cdf

CodePudding user response：

Does this work for you?

import re as regex

text = 'sdfds asd bsd bsd bsd cdf sadasd'
out = regex.findall(r'\b[a-z]{3}\b', text)
print(out)
['asd', 'bsd', 'bsd', 'bsd', 'cdf']

print(' '.join(out))
asd bsd bsd bsd cdf