Home > Software engineering >  Regex not working for omission of specific character string
Regex not working for omission of specific character string

Time:06-24

I am currently using this regex to isolate 5 digit values not preceded/followed by a number, a dash, and a period. I am trying to figure out a way in addition, to account for - also excluding those that contain a WO or PO with case sensitivity in mind. I just have tried different variations on where to put SO AND PO as a conditions to check against but fail at every turn.

(?<![-0-9.])[0-9]{5}(?![-0-9.])

Current output - Not desired

asdkjflsdf 12345     good
asdfsdf 1234asdfsdf  bad
12345                good
.12345.              bad
-12345               bad
SO 12345             good
123456 ppp           bad
1234                 bad
PO12345              good <--
Wo 12345             good <--

Output - Desired

asdkjflsdf 12345     good
asdfsdf 1234asdfsdf  bad
12345                good
.12345.              bad
-12345               bad
SO 12345             good
123456 ppp           bad
1234                 bad
PO12345              bad <--
Wo 12345             bad <--

Any help would be greatly appreciated. Thank you

CodePudding user response:

A two-pass approach, as ti7 suggests, may indeed offer the simplest solution:

'asdkjflsdf 12345',
'asdfsdf 1234asdfsdf',
'12345',
'.12345.',
'-12345',
'SO 12345',
'123456 ppp',
'1234',
'PO12345',
'Wo 12345',
'CompanName WO# 12345' |
  ForEach-Object {
    [pscustomobject] @{
      Input = $_
      Result = $_ -match '(?<![-0-9.])[0-9]{5}(?![-0-9.])' -and $_ -notmatch '[wp]o'
    }
  }

Output:

Input                Result
-----                ------
asdkjflsdf 12345       True
asdfsdf 1234asdfsdf   False
12345                  True
.12345.               False
-12345                False
SO 12345               True
123456 ppp            False
1234                  False
PO12345               False
Wo 12345              False
CompanName WO# 12345  False

CodePudding user response:

You can probably use a much simpler regex and then have a second round excluding the undesirable collections

Round 1 (exactly 5 digits with a word boundary)

^.*\b\d{5}\b.*$

Round 2 (exclude any unwanted matches)

(?![WwPp][Oo])

CodePudding user response:

In PowerShell, you can use

$s -match '(?<![-0-9.])(?<![pw]o[\W_]*)[0-9]{5}(?![-0-9.])'

See the regex demo.

There are two negative lookbehinds added:

  • (?<![-0-9.]) - immediately to the left, there should be no ASCII digit, - or . chars
  • (?<![pw]o[\W_]*) - immediately before the current location, there should be no PO or WO (case insensitive) substrings (as -match matches in a case insensitive way) followed with any zero or more non-alphanumeric chars
  • [0-9]{5} - five ASCII digits
  • (?![-0-9.]) - immediately to the right, there should be no -, . or an ASCII digit.
  • Related