My first array is list of pet owners and their mobile numbers etc. Second array is pets with the information of owner name. Pet list is roughly 1k and owner list is 3k.
What is the fastest way to add the owner information into the pets information? Currently my script takes almost a minute to run which seems a bit too much.
foreach ($pet in $pets) {
$owner = $owners | Where-Object { $_.name -eq $pet.owner }
if ($owner) {
$pet | Add-Member -MemberType NoteProperty -Name "Owner" -Value $owner.name }
}
CodePudding user response:
Doing a linear lookup in the owners array for every pet is what makes your approach fundamentally slow: for every pet, 3,000 owner objects must be searched through, resulting in 1,000 x 3,000 = 3 million lookups.
Performance additionally suffers from incidental implementation choices you made:
As powerful and elegant as the PowerShell pipeline is, its one by one streaming is usually noticeably slower than iterating over an array in an expression / language statement, such as a
foreach
loop.Additionally, as of PowerShell 7.2.2, the
Where-Object
andForEach-Object
cmdlets are inefficiently implemented, which adds additional overhead - see GitHub issue #10982 and this answer.A functional limitation of
Where-Object
that additionally worsens performance is that there is no way to stop enumerating once a match has been found; that is, the input is invariably processed in full, and all matches are output.By contrast, the analogous
.Where()
array method does offer a way to stop processing once the first match is found (e.g.,(1..10).Where({ $_ -ge 5 }, 'First')
. Potentially bringing the same functionality to theWhere-Object
cmdlet is the subject of GitHub issue #13834.
Therefore, you have two choices:
(A) Pragmatic solution: stick with the fundamentally inefficient approach, but improve implementation efficiency so that the resulting performance may be good enough:
- The (A) solution below is substantially faster than your original,
Where-Object
-based approach, about 33 to 38 times, depending on the PowerShell edition used; see the next section for benchmarks.
- The (A) solution below is substantially faster than your original,
(B) Proper, scalable, much better-performing, but more complex solution: Use an aux. data structure that enables efficient lookup of owner objects by name, such as a hashtable, as suggested by Darin.
- The (B) solution below is about 7 to 13 times faster than the (A) solution, depending on the PowerShell edition used, and therefore ca. 260 to 420 (!) times faster than the
Where-Object
solution; see the next section for benchmarks.
- The (B) solution below is about 7 to 13 times faster than the (A) solution, depending on the PowerShell edition used, and therefore ca. 260 to 420 (!) times faster than the
Note:
In the code below, I've modified your example so that a property other than the owner name is added as a new property to each pet object (
.Address
, as.OwnerAddress
), given that the owner name is present to begin with.Also, the
-MemberType NoteProperty
,-Name
and-Value
parameter names in theAdd-Member
call are omitted for brevity (they are implied).
Solution (A): Replace the Where-Object
pipeline with an (inner) foreach
statement:
# Create 1000 sample pets and 3000 sample owners.
$pets = foreach ($i in 1..1000) { [pscustomobject] @{ Name = "Pet $i"; Owner = 'Owner {0}' -f (6 * $i) } }
$owners = foreach ($i in 1..3000) { [pscustomobject] @{ Name = "Owner $i"; Address = "Address $i" } }
foreach ($pet in $pets) {
# Perform the lookup more efficiently via an inner `foreach` loop.
$owner = foreach ($o in $owners) { if ($o.Name -eq $pet.Owner) { $o; break } }
if ($owner) {
Add-Member -InputObject $pet OwnerAddress $owner.Address
}
}
Solution (B): Create a hashtable that maps owner names to owner objects, for efficient lookup:
# Create 1000 sample pets and 3000 sample owners.
$pets = foreach ($i in 1..1000) { [pscustomobject] @{ Name = "Pet $i"; Owner = 'Owner {0}' -f (6 * $i) } }
$owners = foreach ($i in 1..3000) { [pscustomobject] @{ Name = "Owner $i"; Address = "Address $i" } }
# Create a hashtable that maps owner names to owner objects,
# for efficient lookup by name.
$ownerMap = @{}; foreach ($owner in $owners) { $ownerMap[$owner.Name] = $owner }
foreach ($pet in $pets) {
# Look up the pet's owner in the owner map (hashtable); returns $null if not found.
$owner = $ownerMap[$pet.Owner]
if ($owner) {
Add-Member -InputObject $pet OwnerAddress $owner.Address
}
}
Benchmarks
Below are sample timings comparing the three approaches, averaged over 10 runs.
Timing commands is never an exact science in PowerShell, and performance varies based on many factors, not least the hardware with respect to absolute times, but the results below provide a sense of relative performance, as reflected in the
Factor
output column:1.00
denotes the fastest command, listed first, with the slower ones expressed as multiples of it, in descending order of speed.The bottom sections contains the source code, allowing you to run these benchmarks yourself
Caveat: With the given collection sizes, these benchmarks run for quite a while (up to 10 minutes and more), mostly due to how slow the
Where-Object
solution is.For best results, run the benchmarks while your machine isn't (too) busy doing other things.
Note how performance seems to have improved substantially overall in the cross-platform PowerShell (Core) edition, relative to Windows PowerShell.
Windows PowerShell 5.1 on Windows 10:
Factor Secs (10-run avg.) Command
------ ------------------ -------
1.00 0.234 # Hashtable-assisted lookups....
6.85 1.605 # Nested foreach statements...
261.95 61.353 # Pipeline with Where-Object...
PowerShell (Core) 7.2.2 on Windows 10:
Factor Secs (10-run avg.) Command
------ ------------------ -------
1.00 0.096 # Hashtable-assisted lookups.…
12.70 1.216 # Nested foreach statements…
424.40 40.624 # Pipeline with Where-Object…
Benchmark source code:
The following benchmark code uses function
Time-Command
from this Gist.Unless already present, you are prompted to automatically download and define this function in your session. (I can personally assure you that doing that is safe, but you should always check the source code yourself.)
# Download and define function `Time-Command` on demand (will prompt).
# To be safe, inspect the source code at the specified URL first.
if (-not (Get-Command -ErrorAction Ignore Time-Command)) {
$gistUrl = 'https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1'
if ((Read-Host "`n====`n OK to download and define benchmark function ``Time-Command```n from Gist ${gistUrl}?`n=====`n(y/n)?").Trim() -notin 'y', 'yes') { Write-Warning 'Aborted.'; exit 2 }
Invoke-RestMethod $gistUrl | Invoke-Expression
if (-not ${function:Time-Command}) { exit 2 }
}
# Define the collection sizes
$petCount = 1000
$ownerCount = 3000
# Define a sample owners array.
$owners = foreach ($i in 1..$ownerCount) { [pscustomobject] @{ Name = "Owner $i"; Address = "Address $i" } }
# Define a script block that creates a sample pets array.
# Note: We use a script block, because the array must be re-created
# for each run, since the pet objects get modified.
$petGenerator = {
foreach ($i in 1..$petCount) { [pscustomobject] @{ Name = "Pet $i"; Owner = 'Owner {0}' -f (6 * $i) } }
}
# Define script blocks with the commands to time.
$commands = @(
{ # Nested foreach statements
$pets = & $petGenerator
foreach ($pet in $pets) {
$owner = foreach ($o in $owners) { if ($o.Name -eq $pet.Owner) { $o; break } }
if ($owner) {
Add-Member -ea stop -InputObject $pet OwnerAddress $owner.Address
}
}
},
{ # Pipeline with Where-Object
$pets = & $petGenerator
foreach ($pet in $pets) {
$found = $owners | Where-Object { $_.name -eq $pet.Owner }
if ($found) {
Add-Member -InputObject $pet OwnerAddress $owner.Address
}
}
},
{ # Hashtable-assisted lookups.
$pets = & $petGenerator
$ownerMap = @{}; foreach ($owner in $owners) { $ownerMap[$owner.Name] = $owner }
foreach ($pet in $pets) {
$owner = $ownerMap[$pet.Owner]
if ($owner) {
Add-Member -InputObject $pet OwnerAddress $owner.Address
}
}
}
)
Write-Verbose -Verbose 'Running benchmarks...'
# Average 10 runs.
# Add -OutputToHost to print script-block output, if desired.
Time-Command -Count 10 $commands