Home > database >  Not understanding group/value/capture attributes of Powershell object matches method
Not understanding group/value/capture attributes of Powershell object matches method

Time:11-16

Because of my lack of understanding of Powershell objects my question may not be worded accurately. I take it from the documentation Powershell 7.3 ForEach-Object that I am using a script block & utilizing the Powershell automatic variable $_ But that is about as relevant to my example that these docs get.

I'm trying to access each of two parts of a collection of text file type name/address listings. Namely the first three listings (001 - 003) or the second three (004 - 006)

Using $regexListings and $testListings I have tested that I can access, the first three or second three listings, using references to the capture groups e.g $1 $2 See this example running here: regex101

When I run the following Powershell code:

$regexListings = '(?s)(001.*?003.*?$)|(004.*?006.*?$)'

$testListings = 
'001 AALTON Alan 25 Every Street 
002 BROWN James 101 Browns Road 
003 BROWN Jemmima 101 Browns Road
004 BROWN John 101 Browns Road 
005 CAMPBELL Colin 57 Camp Avenue
006 DONNAGAN Dolores 11 Main Road'

$testListings | Select-String -AllMatches -Pattern $regexListings | ForEach-Object {$_.Matches}

Output is:

Groups    : {0, 1, 2}
Success   : True
Name      : 0
Captures  : {0}
Index     : 0
Length    : 204
Value     : 001 AALTON Alan 25 Every Street
            002 BROWN James 101 Browns Road
            003 BROWN Jemmima 101 Browns Road
            004 BROWN John 101 Browns Road
            005 CAMPBELL Colin 57 Camp Avenue
            006 DONNAGAN Dolores 11 Main Road
ValueSpan :

My interpretation of the Powershell output is:

  • there are 3 match groups?
  • no captures available
  • the value is all of it?

Why does the Powershell script output Captures {0} when the link page (regex101) above describes two capture groups which I can access?

The documentation Groups, Captures, and Substitutions is helpful but doesn't address this kind of issue. I have gone on using trial & error examples like:

ForEach-Object {$_.Matches.Groups}
ForEach-Object {$_.Matches.Captures}
ForEach-Object {$_.Matches.Value}

And I'm still none the wiser.

CodePudding user response:

Information overflow. What's being output is what's relevant to us, the administrators. Capture group 0 is the entire value since $regexListings indeed matches the entire string. This is where PowerShell attempts to be helpful with it's rich type system and displays what we may find useful; although, this may just be the implementation of the creators of the cmdlet. So, you were on the right track with $_.Matches.Groups which should've exposed the capture groups and the values for the RegEx matching.

If you're looking to access those values, as mentioned above, you'd have to iterate over .Matches.Groups within that Foreach-Object. What you're passing isn't the individual captures to that cmdlet, but rather the captures of the expression as a whole. This is why you're better off saving to a variable and indexing through the group capture(s) such as: $var.Matches.Groups[0], or $var.Matches.Groups[1], etc.. You can also just use the automatic variable $matches to get some confusion out the way seeing as it's populated via the -Match operator, you can index through the captures with $matches[n] instead. Using your same example:

$regexListings = '(?s)(001.*?003.*?$)|(004.*?006.*?$)'

$testListings = 
'001 AALTON Alan 25 Every Street 
002 BROWN James 101 Browns Road 
003 BROWN Jemmima 101 Browns Road
004 BROWN John 101 Browns Road 
005 CAMPBELL Colin 57 Camp Avenue
006 DONNAGAN Dolores 11 Main Road'

$testListings -match $regexListings 
$Matches

Which outputs:

True # this is output by -match letting you know it's succeeded in matching.

Name                           Value                                                                            
----                           -----                                                                            
1                              001 AALTON Alan 25 Every Street ...                                              
0                              001 AALTON Alan 25 Every Street ... 

Now you have a hashtable with a more representable example of the pattern matching.

CodePudding user response:

In order to access each of two parts of the listings I needed to be able to see them in the output using:

$regexListings = '(?ms)(001.*?003.*?$)|(004.*?006.*?$)'

$testListings | Select-String -AllMatches -Pattern $regexListings | ForEach-Object {$_.Matches.Captures}


Groups    : {0, 1, 2}
Success   : True
Name      : 0
Captures  : {0}
Index     : 0
Length    : 102
Value     : 001 AALTON Alan 25 Every Street
            002 BROWN James 101 Browns Road
            003 BROWN Jemmima 101 Browns Road
ValueSpan : 

Groups    : {0, 1, 2}
Success   : True
Name      : 0
Captures  : {0}
Index     : 103
Length    : 101
Value     : 004 BROWN John 101 Browns Road
            005 CAMPBELL Colin 57 Camp Avenue
            006 DONNAGAN Dolores 11 Main Road
ValueSpan :

The differences from the question code being:

  • using multi line modifier (?ms) instead of (?s) in the regex
  • using {$_.Matches.Captures} as the regex contains capture grouping

Access to these captures can be got from assigning a variable then indexing e.g:

$result = $testListings | Select-String -AllMatches -Pattern $regexListings | ForEach-Object {$_.Matches.Captures}
$result[1]
$result[0]
  • Related