Home > OS >  PowerShell, test the performance/efficiency of asynchronous tasks with Start-Job and Start-Process
PowerShell, test the performance/efficiency of asynchronous tasks with Start-Job and Start-Process

Time:10-09

I'm curious to test out the performance/usefulness of asynchronous tasks in PowerShell with Start-ThreadJob, Start-Job and Start-Process. I have a folder with about 100 zip files and so came up with the following test:

New-Item "000" -ItemType Directory -Force   # Move the old zip files in here
foreach ($i in $zipfiles) {
    $name = $i -split ".zip"
    Start-Job -scriptblock {
        7z.exe x -o"$name" .\$name
        Move-Item $i 000\ -Force
        7z.exe a $i .\$name\*.*
    }
}

The problem with this is that it would start jobs for all 100 zip, which would probably be too much, so I want to set a value $numjobs, say 5, which I can change, such that only $numjobs will be started at the same time, and then the script will check for all 5 of the jobs ending before the next block of 5 will start. I'd like to then watch the CPU and memory depending upon the value of $numjobs

How would I tell a loop only to run 5 times, then wait for the Jobs to finish before continuing?

I see that it's easy to wait for jobs to finish

$jobs = $commands | Foreach-Object { Start-ThreadJob $_ }
$jobs | Receive-Job -Wait -AutoRemoveJobchange

but how might I wait for Start-Process tasks to end?

Although I would like to use Parallel-ForEach, the Enterprises that I work in will be solidly tied to PowerShell 5.1 for the next 3-4 years I expect with no chance to install PowerShell 7.x (although I would be curious for myself to test with Parallel-ForEach on my home system to compare all approaches).

CodePudding user response:

You could add a counter to your foreach loop and break if the counter reaches your desired value

$numjobs = 5
$counter = 0
foreach ($i in $zipfiles) {
  $counter  
  if ($counter -ge $numjobs) {
    break 
  }
  <your code>
}

or with Powershells Foreach-Object

$numjobs = 5
$zipfiles | select -first $numjobs | Foreach-Object {
  <your code>
}

If you want to process the whole array in batches and wait for each batch to complete you have to save the object that is returned by Start-Job and pass it to Wait-Job like this:

$items = 1..100

$batchsize = 5

while ($true) {
    $jobs = @()
    $counter = 0
    foreach ($i in $items) {
        if ($counter -ge $batchsize) {
            $items = $items[$batchsize..($items.Length)]
            break 
        }
        $jobs  = Start-Job -ScriptBlock { Start-Sleep 10 }
        $counter  
    }
    foreach ($job in $jobs) {
        $job | Wait-Job | Out-Null
    }
    if (!$items) {
        break
    }
}

By design arrays have fixed lengths, that's why I'm rewriting the whole array with $items = $items[$batchsize..($items.Length)]

CodePudding user response:

ForEach-Object -Parallel and Start-ThreadJob have built-in functionalities to limit the number of threads that can run at the same time, the same applies for Runspace with their Runspacepool which is what is used behind the scenes by both. Start-Job does not offer such functionality because each Job runs in a separate process as opposed to the cmdlets mentioned before which run in different threads all in the same process. I would also personally not consider it as a parallelism alternative, it is pretty slow and in most cases a linear loop will be faster than it.

How to limit the number of running threads?

Both cmdlets offer the -ThrottleLimit parameter for this.

How would the code look?
$dir = (New-Item "000" -ItemType Directory -Force).FullName

# ForEach-Object -Parallel
$zipfiles | ForEach-Object -Parallel {
    $name = ($_ -split '\.zip')[0]
    7z.exe x -o $name .\$name
    Move-Item $_ $using:dir -Force
    7z.exe a $_ .\$name\*.*
} -ThrottleLimit 5

# Start-ThreadJob
$jobs = foreach ($i in $zipfiles) {
    Start-ThreadJob {
        $name = ($using:i -split '\.zip')[0]
        7z.exe x -o $name .\$name
        Move-Item $using:i $using:dir -Force
        7z.exe a $using:i .\$name\*.*
    } -ThrottleLimit 5
}
$jobs | Receive-Job -Wait -AutoRemoveJob
How to achieve the same having only PowerShell 5.1 available and no ability to install new modules?

The RunspacePool offers this same functionality, either with it's .SetMaxRunspaces(Int32) Method or by targeting one of the RunspaceFactory.CreateRunspacePool Method overloads which offer a maxRunspaces limit.

How would the code look?
$dir   = (New-Item "000" -ItemType Directory -Force).FullName
$limit = 5
$iss   = [initialsessionstate]::CreateDefault2()
$pool  = [runspacefactory]::CreateRunspacePool(1, $limit, $iss, $Host)
$pool.ThreadOptions = [Management.Automation.Runspaces.PSThreadOptions]::ReuseThread
$pool.Open()

$tasks  = foreach ($i in $zipfiles) {
    $ps = [powershell]::Create().AddScript({
        param($path, $dir)

        $name = ($path -split '\.zip')[0]
        7z.exe x -o $name .\$name
        Move-Item $path $dir -Force
        7z.exe a $path .\$name\*.*
    }).AddParameters(@{ path = $i; dir = $dir })
    $ps.RunspacePool = $pool

    @{ Instance = $ps; AsyncResult = $ps.BeginInvoke() }
}

foreach($task in $tasks) {
    $task['Instance'].EndInvoke($task['AsyncResult'])
    $task['Instance'].Dispose()
}
$pool.Dispose()

Note that for all examples, it's unclear if the 7zip code is correct or not, this answer attempts to demonstrate how async is done in PowerShell not how to zip files / folders.

  • Related