Home > Software design >  Powershell Regex question. Escape parenthesis
Powershell Regex question. Escape parenthesis

Time:01-11

Been beating my head around this one all day and I'm getting close but not quite getting there. I have a small subset of my much larger script for just the regex part. Here is the script so far:

  
  $CCI_ID = @(
  "003417 AR-2.1"
  "003425 AR-2.9"
  "003392 AP-1.12"
  "009012 APP-1(21).1"
  )


  [regex]::matches($CCI_ID, '(\d{1,})|([a-zA-Z]{2}[-][\d][\(?\){0,1}[.][\d]{1,})') | 
      ForEach-Object {
        if($_.Groups[1].Value.length -gt 0){
          write-host $('CCI-'   $_.Groups[1].Value.trim())}
        else{$_.Groups[2].Value.trim()}
      }  

CCI-003417
AR-2.1
CCI-003425
AR-2.9
CCI-003392
AP-1.12
CCI-009012
PP-1(21
CCI-1


The output is correct for all but the last one. It should be:
      
      CCI-009012
      APP-1(21).1

Thanks for any advice.

 

CodePudding user response:

Instead of describing and quantifying the (optional) opening and closing parenthesis separately, group them together and then make the whole group optional:

(?:\(\d \))?

The whole pattern thus ends up looking like:

[regex]::Matches($CCI_ID, '(\d{1,})|([a-zA-Z]{2,3}[-][\d](?:\(\d \))?[.][\d]{1,})')

CodePudding user response:

As you experiencing here, Regex expressions might become very complex and unreadable.
Therefore it is often an good idea to view your problem from two different angles:

  • Try matching the part(s) you want, or
  • Try matching the part(s) you don't want

In your case it is probably easier to match the part that you don't want: the delimiter, the space and split your string upon that, which is apparently want to achieve:

$CCI_ID | Foreach-Object {
    $Split = $_ -Split '\s ', 2
    'CCI-'   $Split[0]
    $Split[1]
}

$_ -Split '\s ', 2, Splits the concerned string based on 1 or more white-spaces (where you might also consider a literal space: -Split ' '). The , 2 will prevent the the string to split in more than 2 parts. Meaning that the second part will not be further split even if it contains a spaces.

CodePudding user response:

In your pattern you are using an alternation | but looking at the example data you can match 1 or more whitespaces after it instead.

If there is a match for the pattern, the group 1 value already contains 1 or more digits so you don't have to check for the Value.length

The pattern with the optional digits between parenthesis:

\b(\d )\s ([a-zA-Z]{2,}-\d(?:\(\d \))?\.\d )\b

See a regex101 demo.

$CCI_ID = @(
"003417 AR-2.1"
"003425 AR-2.9"
"003392 AP-1.12"
"009012 APP-1(21).1"
)

[regex]::matches($CCI_ID, '\b(\d )\s ([a-zA-Z]{2,}-\d(?:\(\d \))?\.\d )\b') |
        ForEach-Object {
            write-host $( 'CCI-'   $_.Groups[1].Value.trim() )
            write-host $_.Groups[2].Value.trim()
        }

Output

CCI-003417
AR-2.1
CCI-003425
AR-2.9
CCI-003392
AP-1.12
CCI-009012
APP-1(21).1
  • Related