I am trying to write a regular expression that matches string that contain a certain word (Professional Entity or Inc.). The closest to this I got to is the following:
(?i)(?u)(?<!\S)(((Inc)\.)|(Professional\sEntity))(?!\S)
However the fails where there are special characters like ,-, etc
Sample strings that should work:
test PROFESSIONAL ENTITY new
test inc. new
test inc., new
test inc.,new
inc., new test
PROFESSIONAL ENTITY new
PROFESSIONAL ENTITY new test
PROFESSIONAL ENTITY, new
PROFESSIONAL ENTITY,new
test PROFESSIONAL ENTITY,
PROFESSIONAL ENTITY,
PROFESSIONAL ENTITY, new test
PROFESSIONAL ENTITY,new test
PROFESSIONAL ENTITY-new test
PROFESSIONAL ENTITY- new test
Sample strings that should not work:
PROFESSIONAL ENTITYnew test
test inc.test
test PROFESSIONAL ENTITYnew
testPROFESSIONAL ENTITY new
CodePudding user response:
Your pattern has (?!\S)
at the end that applies to both alternatives, but asserting a whitespace boundary to the right will not match inc.,
You could for example use a word boundary \b
for both ends to prevent partial word matches (or for the first part you could still use the (?<!\S)
)
Then assert not a word character after inc.
using a negative lookahead (?!\w)
(?iu)\b(((Inc)\.(?!\w))|(Professional\sEntity\b))
If you don't need the capture groups:
(?iu)\b(?:Inc\.(?!\w)|Professional\sEntity\b)