Home > Enterprise >  How to pipe results into output array
How to pipe results into output array

Time:10-08

After playing around with some powershell script for a while i was wondering if there is a version of this without using c#. It feels like i am missing some information on how to pipe things properly.

$packages = Get-ChildItem "C:\Users\A\Downloads" -Filter "*.nupkg" |
    %{ $_.Name } 
    # Select-String -Pattern "(?<packageId>[^\d] )\.(?<version>[\w\d\.-] )(?=.nupkg)" |
    # %{ @($_.Matches[0].Groups["packageId"].Value, $_.Matches[0].Groups["version"].Value) } 

         
foreach ($package in $packages){
    
    $match = [System.Text.RegularExpressions.Regex]::Match($package, "(?<packageId>[^\d] )\.(?<version>[\w\d\.-] )(?=.nupkg)")
    Write-Host "$($match.Groups["packageId"].Value) - $($match.Groups["version"].Value)"  
}

Originally i tried to do this with powershell only and thought that with @(1,2,3) you could create an array.

I ended up bypassing the issue by doing the regex with c# instead of powershell, which works, but i am curious how this would have been done with powershell only.

While there are 4 packages, doing just the powershell version produced 8 lines. So accessing my data like $packages[0][0] to get a package id never worked because the 8 lines were strings while i expected 4 arrays to be returned

CodePudding user response:

Terminology note re without using c#: You mean without direct use of .NET APIs. By contrast, C# is just another .NET-based language that can make use of such APIs, just like PowerShell itself.

Note:

  • The next section answers the following question: How can I avoid direct calls to .NET APIs for my regex-matching code in favor of using PowerShell-native commands (operators, automatic variables)?

  • See the bottom section for the Select-String solution that was your true objective; the tl;dr is:

    # Note the `, `, which ensures that the array is output *as a single object*
    %{ , @($_.Matches[0].Groups["packageId"].Value, $_.Matches[0].Groups["version"].Value) }
    

The PowerShell-native (near-)equivalent of your code is (note tha the assumption is that $package contains the content of the input file):

# Caveat: -match is case-INSENSITIVE; use -cmatch for case-sensitive matching.
if ($package -match '(?<packageId>[^\d] )\.(?<version>[\w\d\.-] )(?=.nupkg)') {
  "$($Matches['packageId']) - $($Matches['Version'])"  
}
  • -match, the regular-expression matching operator, is the equivalent of [System.Text.RegularExpressions.Regex]::Match() (which you can shorten to [regex]::Match()) in that it only looks for (at most) one match.

    • Caveat re case-sensitivity: -match (and its rarely used alias -imatch) is case-insensitive by default, as all PowerShell operators are; for case-sensitive matching, use the c-prefixed variant, -cmatch.

    • By contrast, .NET APIs are case-sensitive by default; you'd have to pass the [System.Text.RegularExpressions.RegexOptions]::IgnoreCase flag to [regex]::Match() for case-insensitive matching (you may use 'IgnoreCase', which PowerShell auto-converts for you).

    • As of PowerShell 7.2.x, there is no operator that is the equivalent of the related return-ALL-matches .NET API, [regex]::Matches(). See GitHub issue #7867 for a green-lit but yet-to-be-implemented proposal to introduce one, named -matchall.

  • However, instead of directly returning an object describing what was (or wasn't) matched, -match returns a Boolean, i.e. $true or $false, to indicate whether matching succeeded.

  • Only if -match returns $true does information about a match become available, namely via the automatic $Matches variable, which is a hashtable reflecting the matching parts of the input string: entry 0 is always the full match, with optional additional entries reflecting what any capture groups ((...)) captured, either by index, if they're anonymous (starting with 1) or, as in your case, for named capture groups ((?<name>...)) by name.

    • Syntax note: Given that PowerShell allows use of dot notation (property-access syntax) even with hashtables, the above command could have used $Matches.packageId instead of $Matches['packageId'], for instance, which also works with the numeric (index-based) entries, e.g., $Matches.0 instead of $Matches[0]

    • Caveat: If an array (enumerable) is used as the LHS operand, -match' behavior changes:

      • $Matches is not populated.
      • filtering is performed; that is, instead of returning a Boolean indicating whether matching succeeded, the subarray of matching input strings is returned.
    • Note that the $Matches hashtable only provides the matched strings, not also metadata such as index and length, as found in [regex]::Match()'s return object, which is of type [System.Text.RegularExpressions.Match].


Select-String solution:

$packages | 
  Select-String '(?<packageId>[^\d] )\.(?<version>[\w\d\.-] )(?=.nupkg)' |
  ForEach-Object {
    "$($_.Matches[0].Groups['packageId'].Value) - $($_.Matches[0].Groups['version'].Value)"
  }

As you can see, working with Select-Object's output objects requires you to ultimately work with the same .NET type as when you call [regex]::Match() directly.
However, no method calls are required, and discovering the properties of the output objects is made easy in PowerShell via the Get-Member cmdlet.


If you want to capture the matches in a jagged array:

$capturedStrings = @(
  $packages | 
    Select-String '(?<packageId>[^\d] )\.(?<version>[\w\d\.-] )(?=.nupkg)' |
    ForEach-Object {
      # Output an array of all capture-group matches, 
      # *as a single object* (note the `, `) 
      , $_.Matches[0].Groups.Where({ $_.Name -ne '0' }).Value 
    }
)

This returns an array of arrays, each element of which is the array of capture-group matches for a given package, so that $capturedStrings[0][0] returns the packageId value for the first package, for instance.

Note:

  • $_.Matches[0].Groups.Where({ $_.Name -ne '0' }).Value programmatically enumerates all capture-group matches and returns an their .Value property values as an array, using member-access enumeration; note how name '0' must be excluded, as it represents the whole match.

    • With the capture groups in your specific regex, the above is equivalent to the following, as shown in a commented-out line in your question:

      @($_.Matches[0].Groups['packageId'].Value, $_.Matches[0].Groups['version'].Value)
      
  • , ..., the unary form of the array-construction operator, is used as a shortcut for outputting the array (symbolized by ... here) as a whole, as a single object. By default, enumeration would occur and the elements would be emitted one by one. , ... is in effect a shortcut to the conceptually clearer Write-Output -NoEnumerate ... - see this answer for an explanation of the technique.

  • Additionally, @(...), the array subexpression operator is needed in order to ensure that a jagged array (nested array) is returned even in the event that only one array is returned across all $packages.

  • Related