I have the following kind of string "sdfds asd bsd bsd bsd cdf sadasd"
. The number of "bds"
can vary. I want to extract "asd bsd bsd bsd cdf"
.
I used the following expression "(asd)\s(bsd)*\s(cdf)"
CodePudding user response:
To get a match only, the pattern could be without capturing groups (as repeating a capture group keeps only the value of the last iteration), and match the leading whitespace char in the repetition.
The word boundaries prevent a partial word match.
\basd(?:\sbsd)*\scdf\b
Example
import re
s = "sdfds asd bsd bsd bsd cdf sadasd"
pattern = r"\basd(?:\sbsd)*\scdf\b"
m = re.search(pattern, s)
if m:
print(m.group())
Output
asd bsd bsd bsd cdf
CodePudding user response:
Does this work for you?
import re as regex
text = 'sdfds asd bsd bsd bsd cdf sadasd'
out = regex.findall(r'\b[a-z]{3}\b', text)
print(out)
['asd', 'bsd', 'bsd', 'bsd', 'cdf']
print(' '.join(out))
asd bsd bsd bsd cdf