Home > other >  Select-String pattern finds only partial string match of -cmatch
Select-String pattern finds only partial string match of -cmatch

Time:07-26

I am trying to put together a string replacement routine. I have got as far as isolating the substring matches for two strings stored in array of strings $lines. Except there is a problem:

[string[]]$lines = "160 FROG Kermit  164 Big Bird_Road, Wellsville Singer","161 PIGGY Miss Pretty 1640 Really Long Main_Road, Whathellville Prima Donna"
# match string from last number to comma
foreach ($line in $lines) {
    if ($line -cmatch '\d\s\w[a-z]*\s.*,') {
        Write-Host "Found match!"
        $line | Select-String -Pattern '\d\s\w[a-z]*\s.' -AllMatches |
            ForEach-Object { 
                $x = $_.Matches[1].Value 
                Write-Host "x is:" $x
            }
    }

The first regex in $line -cmatch '\d\s\w[a-z]*\s.*,' is correct according to testing in Expresso. I want the address part of the string from last street number to comma. I am looking to replace the street basename spaces with underscores eg Big Bird_Road with Big_Bird_Road and Really Long Main_Road with Really_Long_Main_Road The problem is that the second regex contained in: $line | Select-String -Pattern '\d\s\w[a-z]*\s.' -AllMatches | Cannot be completed. As it is here. The output is:

Found match!
x is: 4 Big B
Found match!
x is: 0 Really L

The substring has not been captured yet! And if I add the remaining *, I get no output at all for x is:

Why doesn't the first regex (used with -cmatch) work in the same way when used as a Select-String pattern?

CodePudding user response:

If you want to do a replace for that format in the strings, you can might use -replace and might use a patter to match the spaces only to replace them with an underscore:

(?<=\d\s \w[a-zA-Z\s_]*)\s(?=[^\d,]*,)

Explanation

  • (?<= Positive lookbehind to assert what to the left is
    • \d\s \w[a-zA-Z\s_]* Match a digit, 1 whitespace chars, a word char and optionally repeat the listed characters in the character class
  • ) Close the lookbehind
  • \s Match a whitespace char (or \s to match 1 or more)
  • (?=[^\d,]*,) Assert a comma to the right after matching optional chars other than a digit or comma

Regex demo

[string[]]$lines = "160 FROG Kermit  164 Big Bird_Road, Wellsville Singer","161 PIGGY Miss Pretty 1640 Really Long Main_Road, Whathellville Prima Donna"

foreach ($line in $lines) {
    $line -replace "(?<=\d\s \w[a-zA-Z\s_]*)\s(?=[^\d,]*,)","_"
}

Output

160 FROG Kermit  164 Big_Bird_Road, Wellsville Singer
161 PIGGY Miss Pretty 1640 Really_Long_Main_Road, Whathellville Prima Donna
  • Related