Home > Back-end >  How to Access Powershell Select-String Return Values
How to Access Powershell Select-String Return Values

Time:04-05

I have a header section at the beginning of a text file test.txt:

BLOGS-BODGER
SMALLTOWN COMPOSITE ROLL
PAGE 6
PAGE 7
SMALLTOWN COMPOSITE ROLL
BOOMER-BYGRAVE

I am extracting data from this header in the following way:

$filesPathText = "C:\test\test3\test.txt"

# find NAME-NAME strings in text file test.txt
$names = Select-String -Path $filesPathText -Pattern '[A-Z]-[A-Z]'
Write-Host "First name" $names[0] "Second name" $names[1]

# find PAGE X strings in text file test.txt
$pages = Select-String -Path $filesPathText -Pattern 'PAGE'
Write-Host "First page" $pages[0] "Second page" $pages[1]

The output is the following:

First name C:\test\test3\test.txt:3:BLOGS-BODGER Second name C:\test\test3\test.txt:8:BOOMER-BYGRAVE
First page C:\test\test3\test.txt:5:PAGE 6 Second page C:\test\test3\test.txt:6:PAGE 7

I can access line numbers in the following way:

[int]$lineNum = $names[1].LineNumber

So how do I do something similar for the NAME & PAGE values. That is to assign NAME & PAGE data to a variable by eliminating the path & line number data?

This doc is close Example 3: Find a pattern match but doesn't go into parsing out individual elements. This SO post Get a line number on Powershell? has a references to piping output to Select-Object & Expanding Properties??

Any suggestions/explanations would be appreciated.

CodePudding user response:

Santiago Squarzon has provided the crucial pointer in a comment:

  • To obtain the line text from a [Microsoft.PowerShell.Commands.MatchInfo] instance that Select-String outputs for each match, access its .Line property.

    • Note: If you're only looking for the line text, PowerShell (Core) 7 offers a simpler solution, namely the -Raw switch.

    • The property name Line can be misleading, because, strictly speaking, it is the entire text of the matching input string, which, depending on how input is provided, may itself be composed of multiple lines.[1]

  • To obtain which search pattern matched - which is only of interest if multiple patterns were passed - use the .Pattern property.

  • To obtain only the matching part of a line, i.e. the part that matched the search pattern, use .Matches.Value (or, more strictly, .Matches[0].Value).

    • Note: .Matches is an array of [System.Text.RegularExpressions.Match] instances, but that array only ever contains multiple elements if -AllMatches was also specified, in order to request potentially multiple matches per line (per input object).

    • If your search regex(es) contain capture groups (subexpressions enclosed in (...)), you can access what they captured via the .Matches[0].Groups property.[2]

To illustrate all three; note that regex pag. is used to (case-insensitively) match verbatim string PAGE, to illustrate the difference between .Pattern and .Matches.Value; also, the values are enclosed in [...] for delineation:

'PAGE 6' | Select-String -Pattern pag. | ForEach-Object {
  [pscustomobject] @{
    Line = '[{0}]' -f $_.Line
    Pattern = '[{0}]' -f $_.Pattern    
    MatchingLinePart = '[{0}]' -f $_.Matches.Value
  }
}

Output:

Line     Pattern MatchingLinePart
----     ------- ----------------
[PAGE 6] [pag.]  [PAGE]

[1] E.g. ("a`nb" | Select-String a).Line outputs the full two-line input string, because it was provided as a single input object.

[2] E.g, 'PAGE 6' | Select-String 'page (\d )' | ForEach-Object { $_.Matches[0].Groups[1].Value } outputs string 6; index 1 refers to the first (and only) capture group ((...)) in the regex.


Discovering a cmdlet's output data type:

  • Via Get-Member:

    • Pipe a concrete call to Get-Member to discover that call's output type and its members; add -Type Properties to limit the display of members; e.g.:

      PS> 'foo' | Select-String foo | Get-Member -Type Properties 
      
          TypeName: Microsoft.PowerShell.Commands.MatchInfo
      
       Name       MemberType Definition
       ----       ---------- ----------
       Context    Property   Microsoft.PowerShell.Commands.MatchInfoContext Context {get;set;}
       Filename   Property   string Filename {get;}
       IgnoreCase Property   bool IgnoreCase {get;set;}
       Line       Property   string Line {get;set;}
       LineNumber Property   int LineNumber {get;set;}
       Matches    Property   System.Text.RegularExpressions.Match[] Matches {get;set;}
       Path       Property   string Path {get;set;}
       Pattern    Property   string Pattern {get;set;}
      
  • Via a cmdlet's documentation: A cmdlet's help contains an OUTPUTS section that describes the .NET data type(s) of the objects output by that cmdlet.

    • To see this section locally, you must invoke Get-Help with the -Full switch, e.g. Get-Help Select-String -Full

      • -Full results in lengthy output in which the OUTPUTS section may get buried; to isolate it, use something like the following:

        (Get-Help Select-String -Full | Out-String) -replace '(?sm). ^(OUTPUTS. ?)^\S. $', '$1'
        
    • Note that a given cmdlet may situationally produce different output types; e.g., with the -Quiet switch Select-String emits a Boolean.

  • Related