I'm curious to test out the performance/usefulness of asynchronous tasks in PowerShell with Start-ThreadJob
, Start-Job
and Start-Process
. I have a folder with about 100 zip files and so came up with the following test:
New-Item "000" -ItemType Directory -Force # Move the old zip files in here
foreach ($i in $zipfiles) {
$name = $i -split ".zip"
Start-Job -scriptblock {
7z.exe x -o"$name" .\$name
Move-Item $i 000\ -Force
7z.exe a $i .\$name\*.*
}
}
The problem with this is that it would start jobs for all 100 zip, which would probably be too much, so I want to set a value $numjobs
, say 5, which I can change, such that only $numjobs
will be started at the same time, and then the script will check for all 5 of the jobs ending before the next block of 5 will start. I'd like to then watch the CPU and memory depending upon the value of $numjobs
How would I tell a loop only to run 5 times, then wait for the Jobs to finish before continuing?
I see that it's easy to wait for jobs to finish
$jobs = $commands | Foreach-Object { Start-ThreadJob $_ }
$jobs | Receive-Job -Wait -AutoRemoveJobchange
but how might I wait for Start-Process
tasks to end?
Although I would like to use Parallel-ForEach
, the Enterprises that I work in will be solidly tied to PowerShell 5.1 for the next 3-4 years I expect with no chance to install PowerShell 7.x (although I would be curious for myself to test with Parallel-ForEach
on my home system to compare all approaches).
CodePudding user response:
You could add a counter to your foreach loop and break if the counter reaches your desired value
$numjobs = 5
$counter = 0
foreach ($i in $zipfiles) {
$counter
if ($counter -ge $numjobs) {
break
}
<your code>
}
or with Powershells Foreach-Object
$numjobs = 5
$zipfiles | select -first $numjobs | Foreach-Object {
<your code>
}
If you want to process the whole array in batches and wait for each batch to complete you have to save the object that is returned by Start-Job
and pass it to Wait-Job
like this:
$items = 1..100
$batchsize = 5
while ($true) {
$jobs = @()
$counter = 0
foreach ($i in $items) {
if ($counter -ge $batchsize) {
$items = $items[$batchsize..($items.Length)]
break
}
$jobs = Start-Job -ScriptBlock { Start-Sleep 10 }
$counter
}
foreach ($job in $jobs) {
$job | Wait-Job | Out-Null
}
if (!$items) {
break
}
}
By design arrays have fixed lengths, that's why I'm rewriting the whole array with $items = $items[$batchsize..($items.Length)]
CodePudding user response:
ForEach-Object -Parallel
and Start-ThreadJob
have built-in functionalities to limit the number of threads that can run at the same time, the same applies for Runspace with their Runspacepool which is what is used behind the scenes by both. Start-Job
does not offer such functionality because each Job runs in a separate process as opposed to the cmdlets mentioned before which run in different threads all in the same process. I would also personally not consider it as a parallelism alternative, it is pretty slow and in most cases a linear loop will be faster than it.
How to limit the number of running threads?
Both cmdlets offer the -ThrottleLimit
parameter for this.
- https://learn.microsoft.com/en-us/powershell/module/threadjob/start-threadjob?view=powershell-7.2#-throttlelimit
- https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/foreach-object?view=powershell-7.2#-throttlelimit
How would the code look?
$dir = (New-Item "000" -ItemType Directory -Force).FullName
# ForEach-Object -Parallel
$zipfiles | ForEach-Object -Parallel {
$name = ($_ -split '\.zip')[0]
7z.exe x -o $name .\$name
Move-Item $_ $using:dir -Force
7z.exe a $_ .\$name\*.*
} -ThrottleLimit 5
# Start-ThreadJob
$jobs = foreach ($i in $zipfiles) {
Start-ThreadJob {
$name = ($using:i -split '\.zip')[0]
7z.exe x -o $name .\$name
Move-Item $using:i $using:dir -Force
7z.exe a $using:i .\$name\*.*
} -ThrottleLimit 5
}
$jobs | Receive-Job -Wait -AutoRemoveJob
How to achieve the same having only PowerShell 5.1 available and no ability to install new modules?
The RunspacePool offers this same functionality, either with it's .SetMaxRunspaces(Int32)
Method or by targeting one of the RunspaceFactory.CreateRunspacePool
Method overloads which offer a maxRunspaces
limit.
How would the code look?
$dir = (New-Item "000" -ItemType Directory -Force).FullName
$limit = 5
$iss = [initialsessionstate]::CreateDefault2()
$pool = [runspacefactory]::CreateRunspacePool(1, $limit, $iss, $Host)
$pool.ThreadOptions = [Management.Automation.Runspaces.PSThreadOptions]::ReuseThread
$pool.Open()
$tasks = foreach ($i in $zipfiles) {
$ps = [powershell]::Create().AddScript({
param($path, $dir)
$name = ($path -split '\.zip')[0]
7z.exe x -o $name .\$name
Move-Item $path $dir -Force
7z.exe a $path .\$name\*.*
}).AddParameters(@{ path = $i; dir = $dir })
$ps.RunspacePool = $pool
@{ Instance = $ps; AsyncResult = $ps.BeginInvoke() }
}
foreach($task in $tasks) {
$task['Instance'].EndInvoke($task['AsyncResult'])
$task['Instance'].Dispose()
}
$pool.Dispose()
Note that for all examples, it's unclear if the 7zip code is correct or not, this answer attempts to demonstrate how async is done in PowerShell not how to zip files / folders.