Home > front end >  Maintain one-at-a-time pipeline processing when storing output in an intermediary variable
Maintain one-at-a-time pipeline processing when storing output in an intermediary variable

Time:10-21

By way of illustration, here is a function that outputs ten integers.

function Foo { for($i = 0; $i -lt 10; $i  ) { Write-Host Inner $i; $i; }}

When we call Foo as follows, Foo iterates only five times. That's what we want.

Foo | Select-Object -First 5

When we call Foo like this, Foo iterates all ten times. That's what we want to avoid.

$foo = Foo; $foo | Select-Object -First 5;

Sometimes an intermediary variable is useful for readability.

How, if at all, can we maintain PowerShells one-at-a-time processing when we using an intermediary variable?


On request, here is an elaboration of why we might want to do this in the real world. It is still convoluted, but it gets across the idea. The following outputs the first name of a public struct type in a C# file that is under version control.

Get-ChildItem -Directory -Recurse | 
 Where-Object { Test-Path (Join-Path $_.FullName ".git") } | 
  Select-Object -ExpandProperty FullName | 
   Get-ChildItem -File -Recurse -Filter *.cs | 
    Get-Content | 
     Select-String "public struct" |
      Select-Object -First 1;

It's a fine query, but arguably an explanatory variable or two would be useful.

$gitRepositories = Get-ChildItem -Directory -Recurse | 
 Where-Object { Test-Path (Join-Path $_.FullName ".git") };

$csharpFiles = $gitRepositories | 
 Select-Object -ExpandProperty FullName | 
  Get-ChildItem -File -Recurse -Filter *.cs 

$structNames = $csharpFiles | 
 Get-Content | 
  Select-String "public struct" |
   Select-Object -First 1;

When wrapped with Measure-Command, the first query takes 8.5 seconds and the second query take several minutes.

CodePudding user response:

If you make your function an advanced function, by way of [CmdletBinding()], like this:

function Foo { 
    [CmdletBinding()]
    param()

    for($i = 0; $i -lt 10; $i  ) { 
        Write-Host Inner $i; $i; 
    }
}

Then you gain access to the automatic parameter -OutVariable, and now you can do this:

Foo -OutVariable foo | Select-Object -First 5

$foo

Of course, in your example, you could just as easily do:

$foo = Foo | Select-Object -First 5

So I'm not certain exactly what you want, but I do suspect that -OutVariable fits the bill better, because it is populated as the pipeline goes.

So for example:

function Foo { 
    [CmdletBinding()]
    param()

    for($i = 0; $i -lt 10; $i  ) { 
        Write-Host Inner $i; $i;
        sleep -s 2 
    }
}

(now the function sleeps 2 seconds on every iteration)

If you then CTRLC in the middle of it, $foo will contain what was produced so far.


With further comments, I think I see what you're after now: a return value from the function that can be enumerated later. This is an Enumerator.

There are two complications to returning one from your function:

  1. PowerShell tends to automatically "unroll" (process) these on return, but you can probably get around this with the classic trick of returning a single element array, where the element is the enumerator.
  2. Creating the enumerator without pre-processing all the items though is going to be dependent on the specifics of it.

To create an enumerator you'll need to create a type (a class) that inherits from IEnumerator. You can make this class do whatever you want so that's where the complication of the implementation will go.

Your function then will really just serve as a PowerShell interface to instantiating and returning such a thing.

class MyEnumerator : System.Collections.IEnumerator
{
    hidden [int] $index;
    hidden [int] $max;
    hidden [int] $start;
    hidden [int] $item;

    MyEnumerator([int]$max, [int]$start=0) {
        $this.index = -1
        $this.start = $start
        $this.max = $max
    }
    [bool] MoveNext() {
          $this.index
        $this.item = $this.index   $this.start

        # demonstration
        if ($this.item -gt 7) {
            throw
        }

        return ($this.item -le $this.max) 
    }

    [void] Reset() {
        $this.index = -1
    }

    [object] get_Current() {
        return $this.item
    }
}

function Foo ($start, $max) {
    ,[MyEnumerator]::new($max, $start)
}

So here in this example, I've created this enumerator class, and you can give it any start and max integer value for it to enumerate through. But I threw in a little catch in that it's hardcoded to throw an exception if the item value is greater than 7.

The function creates the enumerator and returns it (notice the unary comma for making a single element array).

So you can run this like so:

$foo = Foo 2 10

# no exception!

$foo | Select -First 5

# no exception!

This on the other hand:

$foo = Foo 2 10
$foo
# will throw

(you've attempted to enumerate through the whole thing, implicitly)

One thing you need to be aware of is that $foo is the enumerator object, and keeps state, and now you need to be responsible for that if you're trying to "re-use" it.

Example:

$foo = Foo 2 10
$foo | Select -First 5

$foo.Current
# 6

$foo | Select -First 5
# 7
# exception!

You can use $foo.Reset() to make it go back to the beginning.


Another alternative: returning a function (or just a script block) from a function.

This one won't behave as naturally as the enumerator, but it'll be easier to implement.

function Foo {
    {
        for($i = 0; $i -lt 10; $i  ) { Write-Host Inner $i; $i; }
    }
}

$foo = Foo ; & $foo | Select-Object -First 5

For this, since the return value is a script block, you'll have to execute it before piping it along.

I imagine what you were really looking for was the enumerator, but you can see that that will require some heavier lifting to implement.

CodePudding user response:

The code works fine for the actual objects you collect in $foo - which is an array of integers 0 - 9. The Write-Host will always output to the console*.

With your example the output is

Inner 0
Inner 1
Inner 2
Inner 3
Inner 4
Inner 5
Inner 6
Inner 7
Inner 8
Inner 9
0
1
2
3
4

As you can see the select-object -first 5 worked just fine, selecting 0 - 4.

So to clarify, your entire function is being run to completion when you run

$foo = Foo

That outputs the 10 write-host statements.

Inner 0
Inner 1
Inner 2
Inner 3
Inner 4
Inner 5
Inner 6
Inner 7
Inner 8
Inner 9

Then Select-Object -First 5 selects the 5 objects.

$foo | Select-Object -First 5

0
1
2
3
4

* Unless redirected to another stream

CodePudding user response:

Powershell is doing what you tell it to. The loop already runs 10 times during the assignment statement. The write-host output doesn't get stored in the $foo variable. This seems like the only way to get the output you want, having select-object kill the loop while it's actually running. Write-host output never goes in the pipeline anyway.

$foo = Foo | Select-Object -First 5

Inner 0
Inner 1
Inner 2
Inner 3
Inner 4


$foo

0
1
2
3
4
1..10 | % { write-host num $_ } | select -first 5

num 1
num 2
num 3
num 4
num 5
num 6
num 7
num 8
num 9
num 10
  • Related