I need to extract the word lack with one or 3 words following it from free text using RegEx,
import re
import string
Text = "lack of stair handrails, slippery surfaces, tripping hazards, lack of bathroom grab bars, lack floor"
new_data = re.search(r"(lack (\w \W ){3})", Text)
print(new_data.group())
the result I got is only one sentence
lack of stair handrails,
but I need the result to be
lack of stair handrails
lack of bathroom grab bars
lack floor
Thanks in advance
CodePudding user response:
You can match at least 1 word after lack
and then exclude matching the comma from \W
and repeat that 0-2 times so there can be 1-3 words after lack.
Note that if you want a max of 3 words after lack
, the match given the text lack of bathroom grab bars
will be lack of bathroom grab
If you want to match 1 or more words after it, you can change {0,2}
to *
\black \w (?:[^\w,]\w ){0,2}
If there should not be another lack
matched, you can check the matched word after it:
\black (?!lack\b)\w (?:[^\w,](?!lack\b)\w ){0,2}
CodePudding user response:
If you're working in Python 3.9 you might like to try out an open source package I published recently called pregex. By using pregex, you can build your pattern as such:
from pregex import *
pre = \
"lack" \
op.Either(
3 * (tk.Space() Word()),
tk.Space() Word()
)
You can then even fetch the underlying regex pattern:
regex = pre.get_pattern()
which returns the RegEx pattern that you want:
lack(?:(?: \b\w \b){3}| \b\w \b)
Note though that the above pattern will result in the following matches:
['lack of stair handrails', 'lack of bathroom grab', 'lack floor']
Since you wanted 1 or 3 words after "lack", the match "lack of bathroom grab" does not include the word "bars", though this can be easily fixed:
pre = \
"lack" \
op.Either(
qu.AtLeastAtMost(tk.Space() Word(), n=3, m=4),
tk.Space() Word()
)
which results in the following pattern:
lack(?:(?: \b\w \b){3,4}| \b\w \b)
CodePudding user response:
You can use (\black\b[^,]*)
Explanation:
\b
is to limit the match to the word'lack'
and not that substring inside another word;[^,]*
matches all character except a','
.
Python:
>>> import re
>>> s="lack of stair handrails, slippery surfaces, tripping hazards, lack of bathroom grab bars, lack floor"
>>> re.findall(r'\black\b[^,]*',s)
['lack of stair handrails', 'lack of bathroom grab bars', 'lack floor']