Been beating my head around this one all day and I'm getting close but not quite getting there. I have a small subset of my much larger script for just the regex part. Here is the script so far:
$CCI_ID = @(
"003417 AR-2.1"
"003425 AR-2.9"
"003392 AP-1.12"
"009012 APP-1(21).1"
)
[regex]::matches($CCI_ID, '(\d{1,})|([a-zA-Z]{2}[-][\d][\(?\){0,1}[.][\d]{1,})') |
ForEach-Object {
if($_.Groups[1].Value.length -gt 0){
write-host $('CCI-' $_.Groups[1].Value.trim())}
else{$_.Groups[2].Value.trim()}
}
CCI-003417
AR-2.1
CCI-003425
AR-2.9
CCI-003392
AP-1.12
CCI-009012
PP-1(21
CCI-1
The output is correct for all but the last one. It should be:
CCI-009012
APP-1(21).1
Thanks for any advice.
CodePudding user response:
Instead of describing and quantifying the (optional) opening and closing parenthesis separately, group them together and then make the whole group optional:
(?:\(\d \))?
The whole pattern thus ends up looking like:
[regex]::Matches($CCI_ID, '(\d{1,})|([a-zA-Z]{2,3}[-][\d](?:\(\d \))?[.][\d]{1,})')
CodePudding user response:
As you experiencing here, Regex expressions might become very complex and unreadable.
Therefore it is often an good idea to view your problem from two different angles:
- Try matching the part(s) you want, or
- Try matching the part(s) you don't want
In your case it is probably easier to match the part that you don't want: the delimiter, the space and split your string upon that, which is apparently want to achieve:
$CCI_ID | Foreach-Object {
$Split = $_ -Split '\s ', 2
'CCI-' $Split[0]
$Split[1]
}
$_ -Split '\s ', 2
, Splits the concerned string based on 1 or more white-spaces (where you might also consider a literal space: -Split ' '
). The , 2
will prevent the the string to split in more than 2 parts. Meaning that the second part will not be further split even if it contains a spaces.
CodePudding user response:
In your pattern you are using an alternation |
but looking at the example data you can match 1 or more whitespaces after it instead.
If there is a match for the pattern, the group 1 value already contains 1 or more digits so you don't have to check for the Value.length
The pattern with the optional digits between parenthesis:
\b(\d )\s ([a-zA-Z]{2,}-\d(?:\(\d \))?\.\d )\b
See a regex101 demo.
$CCI_ID = @(
"003417 AR-2.1"
"003425 AR-2.9"
"003392 AP-1.12"
"009012 APP-1(21).1"
)
[regex]::matches($CCI_ID, '\b(\d )\s ([a-zA-Z]{2,}-\d(?:\(\d \))?\.\d )\b') |
ForEach-Object {
write-host $( 'CCI-' $_.Groups[1].Value.trim() )
write-host $_.Groups[2].Value.trim()
}
Output
CCI-003417
AR-2.1
CCI-003425
AR-2.9
CCI-003392
AP-1.12
CCI-009012
APP-1(21).1