I was debugging a pipeline and finally realized all objects referenced the same hashtable. It was a surprise. Is the following logic correct?

In the following pipeline, every instance has a property "MasterHash" with a reference to the same hashtable. The NotePropertyValue is calculated only once when the pipeline starts. Without a Foreach-Object or Where-Object, all expressions can be considered to be evaluated before any items are piped.

$myObjects | Add-Member -NotePropertyName 'MasterHash' -NotePropertyValue @{x='y'}

If a unique hashtable is required for each item, a Foreach-Object is required as follows.

$myObjects | ForEach-Object { Add-Member -InputObject $_ -NotePropertyName 'MyHash' -NotePropertyValue @{x='y'} }

$myObjects | ForEach-Object { $_ | Add-Member -NotePropertyName 'MyHash' -NotePropertyValue @{x='y'} }

I find it a little confusing. Are the expressions evaluated in a begin-process block? Or perhaps the are calculated when the script is complied? Is the Foreach-Object the correct way to access the values in the pipeline? Is there a better syntax to access the pipeline values? For example,

$myObjects | Add-Member -NotePropertyName 'NotMasterHash' -NotePropertyValue ???@{x='y'}???

where the ??? is some magic way to make these evaluate the expression for every item in the pipe.

CodePudding user response：

You can use Trace-Command to see the under-the-hood parameter binding for each of your pipeline examples (detailed documentation here).

Trace-Command -Name ParameterBinding -PSHost -FilePath debug.txt -Expression {
  <# your pipeline here #>
}

You will be able to see both when and how often each expression is evaluated.

Generally speaking, each expression is evaluated when it's command is being invoked in the pipeline - they are not frontloaded. You can test this by injecting a divide by zero error at the end of the pipeline. If the earlier commands run, then the expression at the end isn't actually being evaluated until the command at that stage of the pipeline starts (or the parameter binding starts before that command is called).

CodePudding user response：

In general, all arguments for pipeline commands are bound once before the pipeline starts (even before the first begin{} block runs!). When the whole pipeline runs again (e. g. in a loop), the arguments will also be bound again.

This only applies to the current scope level. When you create a new scope (script block) within a pipeline, e. g. by using ForEach-Object, arguments within this new scope will be bound for each invocation of the script block. Also, parameters that accepts pipeline input can be late-bound, by passing a scriptblock.

Regarding your 1st code sample...

 $myObjects | Add-Member -NotePropertyName 'MasterHash' -NotePropertyValue @{x='y'}

... a single hashtable is created once before the pipeline starts and then passed by reference to Add-Member. Thus all array elements of $myObjects will store this reference to the same hashtable.

Regarding your 2nd and 3rd code samples...

 $myObjects | ForEach-Object { Add-Member -InputObject $_ -NotePropertyName 'MyHash' -NotePropertyValue @{x='y'} }

... what I wrote about new scopes applies, so a new instance of the hashtable will be created for each array element.

For a better understanding we can do some experiments by writing our own pipeline-compatible functions that do some logging.

Test Pipeline Functions

In the following, the function Test-Producer simply outputs the two letters 'A' and 'B' as individual elements, which are consumed by Test-Consumer. The Add-Counter function returns an incremented number for each invocation, which we will use as argument for the pipeline functions.

$script:counter = 1

Function Add-Counter {
    Write-Host "Add-Counter returns $script:counter"
    ($script:counter  )  # output, then increment
}

Function Test-Producer {
    [CmdletBinding()]
    param (
        [Parameter()]
        [int] $counter
    )

    begin {
        Write-Host "Test-Producer begin:   { counter: $counter }"
    }
    process {
        Write-Host "Test-Producer process: { counter: $counter }"
        'A', 'B'  # Outputs two individual elements, not an array
    }
    end {
        Write-Host "Test-Producer end:     { counter: $counter }"
    }
}

Function Test-Consumer {
    [CmdletBinding()]
    param (
        [Parameter(ValueFromPipeline)]
        [String] $inputObject,

        [Parameter()]
        [String] $counter
    )

    begin {
        Write-Host "Test-Consumer begin:   { inputObject: $inputObject; counter: $counter }"
    }
    process {
        Write-Host "Test-Consumer process: { inputObject: $inputObject; counter: $counter }"
    }
    end {
        Write-Host "Test-Consumer end:     { inputObject: $inputObject; counter: $counter }"
    }
}

Experiment 1 - regular pipeline arguments

Pipeline:

Test-Producer -counter (Add-Counter) | Test-Consumer -counter (Add-Counter)

Output:

Add-Counter returns 1
Add-Counter returns 2
Test-Producer begin:   { counter: 1 }
Test-Consumer begin:   { inputObject: ; counter: 2 } 
Test-Producer process: { counter: 1 }
Test-Consumer process: { inputObject: A; counter: 2 }
Test-Consumer process: { inputObject: B; counter: 2 }
Test-Producer end:     { counter: 1 }
Test-Consumer end:     { inputObject: B; counter: 2 }

We can see that Add-Counter gets called two times at the beginning, confirming that pipeline arguments are indeed bound very early, before the pipeline starts. As a consequence, the counter parameter of each command stays at the initial value throughout the pipeline execution.

Experiment 2 - ForEach-Object

Pipeline:

Test-Producer -counter (Add-Counter) | ForEach-Object { 
    $_ | Test-Consumer -counter (Add-Counter) 
}

Output:

Add-Counter returns 1
Test-Producer begin:   { counter: 1 }
Test-Producer process: { counter: 1 }
Add-Counter returns 2
Test-Consumer begin:   { inputObject: ; counter: 2 } 
Test-Consumer process: { inputObject: A; counter: 2 }
Test-Consumer end:     { inputObject: A; counter: 2 }
Add-Counter returns 3
Test-Consumer begin:   { inputObject: ; counter: 3 } 
Test-Consumer process: { inputObject: B; counter: 3 }
Test-Consumer end:     { inputObject: B; counter: 3 }
Test-Producer end:     { counter: 1 }

The output of this pipeline looks a bit different, because we now have called Test-Consumer from within a new scope (script block of ForEach-Object).

Add-Counter gets called only once before the pipeline starts (for Test-Producer argument). Then it gets called two times, once before each call of Test-Consumer. Actually Add-Counter is called before the sub pipeline $_ | Test-Consumer runs, so the rules of pipeline argument binding still apply to this sub pipeline.

Experiment 3 - script parameters

As I wrote at the beginning, parameters that accept pipeline input can be late-bound. To test this, we modify the counter parameter of Test-Consumer by specifying the ValueFromPipelineByPropertyName attribute:

Function Test-Consumer2 {
    [CmdletBinding()]
    param (
        [Parameter(ValueFromPipeline)]
        [String] $inputObject,

        [Parameter(ValueFromPipelineByPropertyName)]
        [String] $counter
    )

    begin {
        Write-Host "Test-Consumer2 begin:   { inputObject: $inputObject; counter: $counter }"
    }
    process {
        Write-Host "Test-Consumer2 process: { inputObject: $inputObject; counter: $counter }"
    }
    end {
        Write-Host "Test-Consumer2 end:     { inputObject: $inputObject; counter: $counter }"
    }
}

Pipeline:

Test-Producer -counter (Add-Counter) | Test-Consumer2 -counter { Add-Counter }

Note the difference compared to the 1st experiment - now we wrap Add-Counter within curly braces to create a script block. Nothing would change, if we had used regular braces as in the 1st experiment.

Output:

Add-Counter returns 1
Test-Producer begin:   { counter: 1 }
Test-Consumer2 begin:   { inputObject: ; counter:  }
Test-Producer process: { counter: 1 }
Add-Counter returns 2
Test-Consumer2 process: { inputObject: A; counter: 2 }
Add-Counter returns 3
Test-Consumer2 process: { inputObject: B; counter: 3 }
Test-Producer end:     { counter: 1 }
Test-Consumer2 end:     { inputObject: B; counter: 3 }

Now the process{} block of Test-Consumer2 gets different values for the counter parameter, similar to the 2nd experiment that used ForEach-Object. The difference to the 2nd example is that Add-Counter calls are interleaved with process blocks, which is more efficient as begin and end of Test-Consumer2 have to be called only once. In the 2nd experiment they had to be called for each input element.

Note that this doesn't work with Add-Member. You can't pass a script block to the NotePropertyValue parameter, because it doesn't accept pipeline input.

Conclusion

To create a new instance of the hashtable for each array element, you could use ForEach-Object, but it's not the most efficient way.

The most efficient way is to use the array method .ForEach, as it doesn't involve pipeline overhead:

$myObjects = [PSCustomObject]@{}, [PSCustomObject]@{}
$myObjects.ForEach{ 
    Add-Member -InputObject $_ -NotePropertyName 'MyHash' -NotePropertyValue @{x='y'} 
}

# Compare references to confirm they are different
$myObjects[0].MyHash -eq $myObjects[1].MyHash

This outputs false as each array element contains a new instance of the hashtable. The -eq operator compares the references only, which are different.

CodePudding user response：

I think I see what you mean? In the first example, the add-member is only run once. Some commands block the pipeline until they gather all the elements, like sort-object. It looks like add-member does this, even in Powershell 7. $myobjects looks the same in both cases. Write-host can be a good debugging tool since it writes outside the pipe. Sometimes you need $() to replace a single pipeline.

# case 1 add-member only
$myobjects = 'item
a
b
c' | convertfrom-csv


$myObjects | Add-Member MyHash @{x=$(write-host hi; get-random)}

hi


$myobjects

item MyHash
---- ------
a    {x}
b    {x}
c    {x}


$myobjects.MyHash  # values all the same

Name                           Value
----                           -----
x                              442950873
x                              442950873
x                              442950873



# case 2 with foreach-object added
$myobjects = 'item
a
b
c' | convertfrom-csv


$myObjects | % { $_ | Add-Member MyHash @{x=$(write-host hi;get-random)} }

hi
hi
hi


$myobjects

item MyHash
---- ------
a    {x}
b    {x}
c    {x}


$myobjects.myhash # values different

Name                           Value
----                           -----
x                              444785508
x                              1083187689
x                              137326227

CodePudding user response：

If you want to get a vague perspective of what the Add-Member cmdlet is doing for this specific case you can see it this way:

$objects = 0..10 | ForEach-Object {
    [pscustomobject]@{
        Val = $_
    }
}

function Add-Member2 {
[cmdletbinding()]
param(
    [parameter(Mandatory)]
    [string]$NotePropertyName,
    [parameter(Mandatory)]
    [object]$NotePropertyValue,
    [parameter(ValueFromPipeline, Mandatory)]
    [object]$InputObject
)
    begin {
        $newProp = [psnoteproperty]::new(
            $NotePropertyName,
            $NotePropertyValue
        )
    }

    process {
        $InputObject.PSObject.Properties.Add($newProp)
    }
}

$objects | Add-Member2 -NotePropertyName 'MasterHash' -NotePropertyValue @{ x = 'y' }