Home > Mobile >  RegEx using python extracting specific word followed with other three word more than once
RegEx using python extracting specific word followed with other three word more than once

Time:09-10

I need to extract the word lack with one or 3 words following it from free text using RegEx,

import re
import string
Text = "lack of stair handrails, slippery surfaces, tripping hazards, lack of bathroom grab bars, lack floor"
new_data = re.search(r"(lack (\w \W ){3})", Text)

print(new_data.group())

the result I got is only one sentence

lack of stair handrails,

but I need the result to be

lack of stair handrails
lack of bathroom grab bars
lack floor

Thanks in advance

CodePudding user response:

You can match at least 1 word after lack and then exclude matching the comma from \W and repeat that 0-2 times so there can be 1-3 words after lack.

Note that if you want a max of 3 words after lack, the match given the text lack of bathroom grab bars will be lack of bathroom grab

If you want to match 1 or more words after it, you can change {0,2} to *

\black \w (?:[^\w,]\w ){0,2}

Regex demo

If there should not be another lack matched, you can check the matched word after it:

\black (?!lack\b)\w (?:[^\w,](?!lack\b)\w ){0,2}

Regex demo

CodePudding user response:

If you're working in Python 3.9 you might like to try out an open source package I published recently called pregex. By using pregex, you can build your pattern as such:

from pregex import *

pre = \
    "lack"   \
    op.Either(
        3 * (tk.Space()   Word()),
        tk.Space()   Word()
    )

You can then even fetch the underlying regex pattern:

regex = pre.get_pattern()

which returns the RegEx pattern that you want:

lack(?:(?: \b\w \b){3}| \b\w \b)

Note though that the above pattern will result in the following matches:

['lack of stair handrails', 'lack of bathroom grab', 'lack floor']

Since you wanted 1 or 3 words after "lack", the match "lack of bathroom grab" does not include the word "bars", though this can be easily fixed:

pre = \
    "lack"   \
    op.Either(
        qu.AtLeastAtMost(tk.Space()   Word(), n=3, m=4),
        tk.Space()   Word()
    )

which results in the following pattern:

lack(?:(?: \b\w \b){3,4}| \b\w \b)

CodePudding user response:

You can use (\black\b[^,]*)

Demo

Explanation:

  1. \b is to limit the match to the word 'lack' and not that substring inside another word;
  2. [^,]* matches all character except a ','.

Python:

>>> import re
>>> s="lack of stair handrails, slippery surfaces, tripping hazards, lack of bathroom grab bars, lack floor"
>>> re.findall(r'\black\b[^,]*',s)
['lack of stair handrails', 'lack of bathroom grab bars', 'lack floor'] 
  • Related