Home > other >  What good does a SteppablePipeline
What good does a SteppablePipeline

Time:07-23

In our company were I am quiet new, they have a wrapper for almost every Native PowerShell cmdlet mainly to add more logging and error handeling. I am trying to push back on this and also refer to internal PowerShell feature to create a proxy command like:

$GCI = Get-Command Get-ChildItem
[System.Management.Automation.ProxyCommand]::Create($GCI)

But I am lacking some knowledge here.
What is difference (if any) between a SteppablePipeline and using the native PowerShell syntax.
In other words, in the Process block, what is the difference between:

$steppablePipeline.Process($_)

and using the native PowerShell syntax:

$_ |Microsoft.PowerShell.Management\Get-ChildItem # In this example

I am aware that I am seeking for general information but it appears to me that there is hardly any information on e.g. the ScriptBlock.GetSteppablePipeline Method

CodePudding user response:

This venerable blog post from 2009, which introduced proxy functions (wrapper functions), explains that steppable pipelines are required to implement them; the following quote suggests (but doesn't explicitly state) that they may have been created for that very purpose:

In particular, what you want to have happen is to be able to control the execution of the calling command – to control when it’s BEGINPROCESS(), PROCESSRECORD(), ENDPROCESS(), etc methods are called

Simply put, proxy functions, via steppable pipelines, allow you to implement a cmdlet (advanced function) by delegating most of the implementation to another cmdlet in a memory-efficient, streaming manner.

Specifically, a steppable pipeline allows you to delegate the implementation of your proxy function to a script block whose life cycle is kept in sync with the proxy function itself, in terms of initialization (begin block), per-object pipeline input processing (process block), and termination (end block), which means that the a single instantiation of the wrapped cmdlet is in effect directly connected to the same pipeline as the proxy function itself.

Conversely, this means: you don't strictly need a proxy function to write a wrapper function in the following scenarios:

  • If your wrapper function doesn't need to support pipeline input.

  • If you don't mind collecting all pipeline input first, before passing it all to the wrapped cmdlet at once, in your wrapper function's end block, which means that you're forgoing streaming processing

    • While you may also get streaming processing if you call the wrapped cmdlet for each input object in your process block, doing so:
      • is inefficient (a full invocation of the wrapped cmdlet in every iteration, in a nested pipeline)
      • doesn't work for cmdlets that need to operate on all input as a whole, such as Format-* cmdlets or aggregating cmdlets such as Sort-Object and Group-Object

The following are three different implementations of a wrapper function around Select-String, which reports only the matching part of each matching line, as a string, to illustrate the tradeoffs:

  • Select-MatchProxy is a proper proxy function, i.e. it calls Select-String via a steppable pipeline, which amounts to streaming processing that only involves a single call instantiation of Select-String.

  • Select-MatchSimple calls a new Select-String instance in each process block, which also amounts to streaming processing, but performs poorly; as noted above, this implementation approach isn't always feasible, depending on what cmdlet is being wrapped.

  • Select-MatchCollect collects all pipeline input up front, and then passes it to Select-String in the end block, which forgoes streaming processing and is memory-intensive; however, in terms of runtime it actually performs slightly better than the proxy function.

function Select-MatchProxy {
  [CmdletBinding(PositionalBinding=$false)]
  param(
    [Parameter(Mandatory, ValueFromPipeline)]
    $InputObject,
    [Parameter(Mandatory, Position=0)]
    [string] $Pattern
  )
  begin {
    $steppablePipeline = { 
       Select-String -Pattern $Pattern | ForEach-Object { $_.Matches.Value }
     }.GetSteppablePipeline($myInvocation.CommandOrigin)
    $steppablePipeline.Begin($PSCmdlet)
  }
  process {
    $steppablePipeline.Process($InputObject)
  }
  end {
    $steppablePipeline.End()
  }
}

function Select-MatchSimple {
  [CmdletBinding(PositionalBinding=$false)]
  param(
    [Parameter(Mandatory, ValueFromPipeline)]
    $InputObject,
    [Parameter(Mandatory, Position=0)]
    [string] $Pattern
  )
  process {
    Select-String -InputObject $InputObject -Pattern $Pattern |
      ForEach-Object {
        $_.Matches.Value
      }
  }
}

function Select-MatchCollect {
  [CmdletBinding(PositionalBinding=$false)]
  param(
    [Parameter(Mandatory, ValueFromPipeline)]
    $InputObject,
    [Parameter(Mandatory, Position=0)]
    [string] $Pattern
  )
  begin {
    $l = [System.Collections.Generic.List[object]]::new()
  }
  process {
    $l.Add($InputObject)
  }
  end {
    $l | Select-String -Pattern $Pattern | ForEach-Object { $_.Matches.Value }
  }
}

To compare runtimes, you can use the following code:

# Sample input array of 100,000 strings.
$array = ('foo', 'bar') * 50000
# Time 15 runs of each function, and report the average.
Time-Command { $array | Select-MatchProxy   'o ' }, 
             { $array | Select-MatchSimple  'o ' }, 
             { $array | Select-MatchCollect 'o ' }

Sample timings from a macOS 12.4 M1 Mac running PowerShell Core 7.3.0-preview.6, which give a sense of relative performance:

Factor Secs (15-run avg.) Command                           TimeSpan
------ ------------------ -------                           --------
1.00   0.916              $array | Select-MatchCollect 'o ' 00:00:00.9162298
1.12   1.025              $array | Select-MatchProxy   'o ' 00:00:01.0254835
5.38   4.930              $array | Select-MatchSimple  'o ' 00:00:04.9298495

The above uses the Time-Command function from this Gist.

  • Assuming you have looked at the linked Gist's source code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:

    irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex
    
  • Related