Home > Software engineering >  Regular expression to check if a word is within a string including delimiters
Regular expression to check if a word is within a string including delimiters

Time:07-01

I am trying to write a regular expression that matches string that contain a certain word (Professional Entity or Inc.). The closest to this I got to is the following:

(?i)(?u)(?<!\S)(((Inc)\.)|(Professional\sEntity))(?!\S)

However the fails where there are special characters like ,-, etc

Sample strings that should work:

test PROFESSIONAL ENTITY new
test inc. new
test inc., new
test inc.,new
inc., new test
PROFESSIONAL ENTITY new
PROFESSIONAL ENTITY new test
PROFESSIONAL ENTITY, new
PROFESSIONAL ENTITY,new
test PROFESSIONAL ENTITY,
PROFESSIONAL ENTITY,
PROFESSIONAL ENTITY, new test
PROFESSIONAL ENTITY,new test
PROFESSIONAL ENTITY-new test
PROFESSIONAL ENTITY- new test

Sample strings that should not work:

PROFESSIONAL ENTITYnew test
test inc.test
test PROFESSIONAL ENTITYnew
testPROFESSIONAL ENTITY new

CodePudding user response:

Your pattern has (?!\S) at the end that applies to both alternatives, but asserting a whitespace boundary to the right will not match inc.,


You could for example use a word boundary \b for both ends to prevent partial word matches (or for the first part you could still use the (?<!\S))

Then assert not a word character after inc. using a negative lookahead (?!\w)

(?iu)\b(((Inc)\.(?!\w))|(Professional\sEntity\b))

Regex demo

If you don't need the capture groups:

(?iu)\b(?:Inc\.(?!\w)|Professional\sEntity\b)
  • Related