Home > database >  Understanding access of object attributes in Powershell scripting
Understanding access of object attributes in Powershell scripting

Time:01-06

Firstly I'm trying to understand this. Second I would like to use it.

 # test string
$pgNumString = 'C:\test\test5\AALTONEN-ALLAN_PENCARROW_PAGE_1.txt'

# Regex with capture group for number '1' ONLY from $pgNumString
# In other use cases it may be page 10 or any page in 100s
$pgNumRegex = "(?s)_(\d )\."

# Simplest - not using -SimpleMatch because this example uses regex (Select-String docs)
$pgNum = $pgNumString | Select-String -Pattern $pgNumRegex -AllMatches 

The match is not assigned to $pgNum. No capture grouping means no good anyway. A slightly more sophisticated attempt:

$pgNum = $pgNumString | Select-String -Pattern $pgNumRegex -AllMatches | Select-Object {$_.Matches.Groups[1].Value} 

Output:

$_.Matches.Groups[1].Value
--------------------------
1

The match is still not assigned to $pgNum. But the output shows I'm on the right track. What am I doing wrong?

CodePudding user response:

Especially if you're dealing with strings already in memory, but often also with files (except if they're exceptionally large), use of Select-String isn't necessary and both slows down and complicates the solution, as your example shows.

While -match works in principle too - to focus on matching only what should be extracted - it is limited to one match, whose results are reflected in the automatic $Matches variable.

However, you can make direct use of an underlying .NET API, namely [regex]::Matches().

# Sample input.
$pgNumString = @'
C:\test\test5\AALTONEN-ALLAN_PENCARROW_PAGE_1.txt
C:\test\test6\AALTONEN-ALLAN_PENCARROW_PAGE_42.txt
'@

# -> '1', '42'
# Note: To match PowerShell's case-*insensitive* behavior (not relevant here), use:
#  [regex]::Matches($pgNumString, '(?<=_)\d (?=\.)', 'IgnoreCase').Value
[regex]::Matches($pgNumString, '(?<=_)\d (?=\.)').Value

As an aside:

  • Bringing the functionality of [regex]::MatchAll() natively to PowerShell in the future, in the form of a -matchall operator, is the subject of GitHub issue #7867.

Note that I've modified your regex to use look-around assertions so that what it captures consists solely of the substring to extract, reflected in the .Value property.
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.

Using your original approach requires extra work to extract the capture-group values, with the help of the intrinsic .ForEach() method:

[regex]::Matches($pgNumString, '_(\d )\.').ForEach({ $_.Groups[1].Value })

As for what you tried:

As Santiago notes, you need to use ForEach-Object instead of Select-Object, but there's an additional requirement:

Given your use of -AllMatches, you need to access .Groups[1].Value on each of the matches reported in .Matches, otherwise you'll only get the first match's capture-group value:

$pgNumString | 
  Select-String -Pattern $pgNumRegex -AllMatches |
  ForEach-Object { $_.Matches.ForEach({ $_.Groups[1].Value }) }

As an aside:

  • Making Select-String only return the matching parts of the input lines / strings, via an -OnlyMatching switch is a green-lit future enhancement - see GitHub issue #7712

  • While this wouldn't directly help with capture groups, it is usually possible to reformulate regexes with look-around assertions, as shown with [regex]::Matches() above.

  • Related