In our company were I am quiet new, they have a wrapper for almost every Native PowerShell cmdlet mainly to add more logging and error handeling. I am trying to push back on this and also refer to internal PowerShell feature to create a proxy command like:
$GCI = Get-Command Get-ChildItem
[System.Management.Automation.ProxyCommand]::Create($GCI)
But I am lacking some knowledge here.
What is difference (if any) between a SteppablePipeline and using the native PowerShell syntax.
In other words, in the Process
block, what is the difference between:
$steppablePipeline.Process($_)
and using the native PowerShell syntax:
$_ |Microsoft.PowerShell.Management\Get-ChildItem # In this example
I am aware that I am seeking for general information but it appears to me that there is hardly any information on e.g. the ScriptBlock.GetSteppablePipeline
Method
CodePudding user response:
This venerable blog post from 2009, which introduced proxy functions (wrapper functions), explains that steppable pipelines are required to implement them; the following quote suggests (but doesn't explicitly state) that they may have been created for that very purpose:
In particular, what you want to have happen is to be able to control the execution of the calling command – to control when it’s BEGINPROCESS(), PROCESSRECORD(), ENDPROCESS(), etc methods are called
Simply put, proxy functions, via steppable pipelines, allow you to implement a cmdlet (advanced function) by delegating most of the implementation to another cmdlet in a memory-efficient, streaming manner.
Specifically, a steppable pipeline allows you to delegate the implementation of your proxy function to a script block whose life cycle is kept in sync with the proxy function itself, in terms of initialization (begin
block), per-object pipeline input processing (process
block), and termination (end
block), which means that the a single instantiation of the wrapped cmdlet is in effect directly connected to the same pipeline as the proxy function itself.
Conversely, this means: you don't strictly need a proxy function to write a wrapper function in the following scenarios:
If your wrapper function doesn't need to support pipeline input.
If you don't mind collecting all pipeline input first, before passing it all to the wrapped cmdlet at once, in your wrapper function's
end
block, which means that you're forgoing streaming processing- While you may also get streaming processing if you call the wrapped cmdlet for each input object in your
process
block, doing so:- is inefficient (a full invocation of the wrapped cmdlet in every iteration, in a nested pipeline)
- doesn't work for cmdlets that need to operate on all input as a whole, such as
Format-*
cmdlets or aggregating cmdlets such asSort-Object
andGroup-Object
- While you may also get streaming processing if you call the wrapped cmdlet for each input object in your
The following are three different implementations of a wrapper function around Select-String
, which reports only the matching part of each matching line, as a string, to illustrate the tradeoffs:
Select-MatchProxy
is a proper proxy function, i.e. it callsSelect-String
via a steppable pipeline, which amounts to streaming processing that only involves a single call instantiation ofSelect-String
.- It is based on a stripped-down version of the scaffolding code that
[System.Management.Automation.ProxyCommand]::Create((Get-Commmand 'Select-String'))
generates. - GitHub issue #10863 discusses potential improvements to the code that
[System.Management.Automation.ProxyCommand]::Create()
generates.
- It is based on a stripped-down version of the scaffolding code that
Select-MatchSimple
calls a newSelect-String
instance in eachprocess
block, which also amounts to streaming processing, but performs poorly; as noted above, this implementation approach isn't always feasible, depending on what cmdlet is being wrapped.Select-MatchCollect
collects all pipeline input up front, and then passes it toSelect-String
in theend
block, which forgoes streaming processing and is memory-intensive; however, in terms of runtime it actually performs slightly better than the proxy function.
function Select-MatchProxy {
[CmdletBinding(PositionalBinding=$false)]
param(
[Parameter(Mandatory, ValueFromPipeline)]
$InputObject,
[Parameter(Mandatory, Position=0)]
[string] $Pattern
)
begin {
$steppablePipeline = {
Select-String -Pattern $Pattern | ForEach-Object { $_.Matches.Value }
}.GetSteppablePipeline($myInvocation.CommandOrigin)
$steppablePipeline.Begin($PSCmdlet)
}
process {
$steppablePipeline.Process($InputObject)
}
end {
$steppablePipeline.End()
}
}
function Select-MatchSimple {
[CmdletBinding(PositionalBinding=$false)]
param(
[Parameter(Mandatory, ValueFromPipeline)]
$InputObject,
[Parameter(Mandatory, Position=0)]
[string] $Pattern
)
process {
Select-String -InputObject $InputObject -Pattern $Pattern |
ForEach-Object {
$_.Matches.Value
}
}
}
function Select-MatchCollect {
[CmdletBinding(PositionalBinding=$false)]
param(
[Parameter(Mandatory, ValueFromPipeline)]
$InputObject,
[Parameter(Mandatory, Position=0)]
[string] $Pattern
)
begin {
$l = [System.Collections.Generic.List[object]]::new()
}
process {
$l.Add($InputObject)
}
end {
$l | Select-String -Pattern $Pattern | ForEach-Object { $_.Matches.Value }
}
}
To compare runtimes, you can use the following code:
# Sample input array of 100,000 strings.
$array = ('foo', 'bar') * 50000
# Time 15 runs of each function, and report the average.
Time-Command { $array | Select-MatchProxy 'o ' },
{ $array | Select-MatchSimple 'o ' },
{ $array | Select-MatchCollect 'o ' }
Sample timings from a macOS 12.4 M1 Mac running PowerShell Core 7.3.0-preview.6, which give a sense of relative performance:
Factor Secs (15-run avg.) Command TimeSpan
------ ------------------ ------- --------
1.00 0.916 $array | Select-MatchCollect 'o ' 00:00:00.9162298
1.12 1.025 $array | Select-MatchProxy 'o ' 00:00:01.0254835
5.38 4.930 $array | Select-MatchSimple 'o ' 00:00:04.9298495
The above uses the Time-Command
function from this Gist.
Assuming you have looked at the linked Gist's source code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:
irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex