I have a header section at the beginning of a text file test.txt
:
BLOGS-BODGER
SMALLTOWN COMPOSITE ROLL
PAGE 6
PAGE 7
SMALLTOWN COMPOSITE ROLL
BOOMER-BYGRAVE
I am extracting data from this header in the following way:
$filesPathText = "C:\test\test3\test.txt"
# find NAME-NAME strings in text file test.txt
$names = Select-String -Path $filesPathText -Pattern '[A-Z]-[A-Z]'
Write-Host "First name" $names[0] "Second name" $names[1]
# find PAGE X strings in text file test.txt
$pages = Select-String -Path $filesPathText -Pattern 'PAGE'
Write-Host "First page" $pages[0] "Second page" $pages[1]
The output is the following:
First name C:\test\test3\test.txt:3:BLOGS-BODGER Second name C:\test\test3\test.txt:8:BOOMER-BYGRAVE
First page C:\test\test3\test.txt:5:PAGE 6 Second page C:\test\test3\test.txt:6:PAGE 7
I can access line numbers in the following way:
[int]$lineNum = $names[1].LineNumber
So how do I do something similar for the NAME & PAGE values. That is to assign NAME & PAGE data to a variable by eliminating the path & line number data?
This doc is close Example 3: Find a pattern match but doesn't go into parsing out individual elements. This SO post Get a line number on Powershell? has a references to piping output to Select-Object
& Expanding Properties??
Any suggestions/explanations would be appreciated.
CodePudding user response:
Santiago Squarzon has provided the crucial pointer in a comment:
To obtain the line text from a
[Microsoft.PowerShell.Commands.MatchInfo]
instance thatSelect-String
outputs for each match, access its.Line
property.Note: If you're only looking for the line text, PowerShell (Core) 7 offers a simpler solution, namely the
-Raw
switch.The property name Line can be misleading, because, strictly speaking, it is the entire text of the matching input string, which, depending on how input is provided, may itself be composed of multiple lines.[1]
To obtain which search pattern matched - which is only of interest if multiple patterns were passed - use the
.Pattern
property.To obtain only the matching part of a line, i.e. the part that matched the search pattern, use
.Matches.Value
(or, more strictly,.Matches[0].Value
).Note:
.Matches
is an array of[System.Text.RegularExpressions.Match]
instances, but that array only ever contains multiple elements if-AllMatches
was also specified, in order to request potentially multiple matches per line (per input object).If your search regex(es) contain capture groups (subexpressions enclosed in
(...)
), you can access what they captured via the.Matches[0].Groups
property.[2]
To illustrate all three; note that regex pag.
is used to (case-insensitively) match verbatim string PAGE
, to illustrate the difference between .Pattern
and .Matches.Value
; also, the values are enclosed in [...]
for delineation:
'PAGE 6' | Select-String -Pattern pag. | ForEach-Object {
[pscustomobject] @{
Line = '[{0}]' -f $_.Line
Pattern = '[{0}]' -f $_.Pattern
MatchingLinePart = '[{0}]' -f $_.Matches.Value
}
}
Output:
Line Pattern MatchingLinePart
---- ------- ----------------
[PAGE 6] [pag.] [PAGE]
[1] E.g. ("a`nb" | Select-String a).Line
outputs the full two-line input string, because it was provided as a single input object.
[2] E.g, 'PAGE 6' | Select-String 'page (\d )' | ForEach-Object { $_.Matches[0].Groups[1].Value }
outputs string 6
; index 1
refers to the first (and only) capture group ((...)
) in the regex.
Discovering a cmdlet's output data type:
Via
Get-Member
:Pipe a concrete call to
Get-Member
to discover that call's output type and its members; add-Type Properties
to limit the display of members; e.g.:PS> 'foo' | Select-String foo | Get-Member -Type Properties TypeName: Microsoft.PowerShell.Commands.MatchInfo Name MemberType Definition ---- ---------- ---------- Context Property Microsoft.PowerShell.Commands.MatchInfoContext Context {get;set;} Filename Property string Filename {get;} IgnoreCase Property bool IgnoreCase {get;set;} Line Property string Line {get;set;} LineNumber Property int LineNumber {get;set;} Matches Property System.Text.RegularExpressions.Match[] Matches {get;set;} Path Property string Path {get;set;} Pattern Property string Pattern {get;set;}
Via a cmdlet's documentation: A cmdlet's help contains an
OUTPUTS
section that describes the .NET data type(s) of the objects output by that cmdlet.To see this section locally, you must invoke
Get-Help
with the-Full
switch, e.g.Get-Help Select-String -Full
-Full
results in lengthy output in which theOUTPUTS
section may get buried; to isolate it, use something like the following:(Get-Help Select-String -Full | Out-String) -replace '(?sm). ^(OUTPUTS. ?)^\S. $', '$1'
Note that a given cmdlet may situationally produce different output types; e.g., with the
-Quiet
switchSelect-String
emits a Boolean.