Home > OS >  Enumerating large powershell object variable (1 million plus members)
Enumerating large powershell object variable (1 million plus members)

Time:11-16

I'm processing large amounts of data and after pulling the data and manipulating it, I have the results stored in memory in a variable.

I now need to separate this data into separate variables and this was easily done via piping and using a where-object, but this has slowed down now that I have much more data (1 million plus members). Note: it takes about 5 minutes.

$DCEntries = $DNSQueries | ? {$_.ClientIP -in $DCs.ipv4address -Or $_.ClientIP -eq '127.0.0.1'}
$NonDCEntries = $DNSQueries | ? {$_.ClientIP -notin $DCs.ipv4address -And $_.ClientIP -ne '127.0.0.1'} 

#Note: 
#$DCs is an array of 60 objects of type Microsoft.ActiveDirectory.Management.ADDomainController, with two properties:  Name, ipv4address
#$DNSQueries is a collection of pscustomobjects that has 6 properties, all strings.

I immediately realize I'm enumerating $DNSQueries (the large object) twice, which is obviously costing me some time. As such I decided to go about this a different way enumerating it once and using a Switch statement, but this seems to have exponentially caused the timing to INCREASE, which is not what I was going for.

$DNSQueries | ForEach-Object {
    Switch ($_) {
        {$_.ClientIP -in $DCs.ipv4address -Or $_.ClientIP -eq '127.0.0.1'} {
            # Query is from a DC
            $DCEntries  = $_
        }
        default {
            # Query is not from DC
            $NonDCEntries  = $_
        }
    }
}

I'm wondering if someone can explain to me why the second code takes so much more time. Further, perhaps offer a better way to accomplish what I want.

Is the Foreach-Object and/or appending of the sub variables costing that much time?

CodePudding user response:

ForEach-Object is actually the slowest way to enumerate a collection but also there is a follow-up switch with a script block condition causing even more overhead.

If the collection is already in memory, nothing can beat a foreach loop for linear enumeration.

As for your biggest problem, the use of = to add elements to an array and it being a fixed size collection. PowerShell has to create a new array and copy all elements to a new array each time a new element is added, this causes an extremely high amount of overhead. See this answer as well as this awesome documention for more details.

In this case you can combine a Collections.Generic.List<T> with PowerShell's explicit assignment.

$NonDCEntries = [Collections.Generic.List[object]]::new()

$DCEntries = foreach($item in $DNSQueries) {
    if($item.ClientIP -in $DCs.IPv4Address -Or $_.ClientIP -eq '127.0.0.1') {
        $item
        continue
    }
    $NonDCEntries.Add($item)
}

To put into perspective how exponentially bad = to an array is, you can test this code:

$Tests = [ordered]@{
    'PowerShell Explicit Assignment' = {
        $result = foreach($i in 1..$count) {
            $i
        }
    }
    ' = Operator to System.Array' = {
        $result = @( )
        foreach($i in 1..$count) {
            $result  = $i
        }
    }
    '.Add(..) to List<T>' = {
        $result = [Collections.Generic.List[int]]::new()
        foreach($i in 1..$count) {
            $result.Add($i)
        }
    }
}

foreach($count in 1000, 10000, 100000) {
    foreach($test in $Tests.GetEnumerator()) {
        $measurement = (Measure-Command { & $test.Value }).TotalMilliseconds
        $totalRound  = [math]::Round($measurement, 2).ToString()   ' ms'

        [pscustomobject]@{
            CollectionSize    = $count
            Test              = $test.Key
            TotalMilliseconds = $totalRound
        }
    }
}

Which in my laptop yields the following results:

CollectionSize Test                           TotalMilliseconds
-------------- ----                           -----------------
          1000 PowerShell Explicit Assignment 15.9 ms
          1000  = Operator to System.Array    26.88 ms
          1000 .Add(..) to List<T>            12.47 ms
         10000 PowerShell Explicit Assignment 1.07 ms
         10000  = Operator to System.Array    2488.24 ms
         10000 .Add(..) to List<T>            0.9 ms
        100000 PowerShell Explicit Assignment 16.07 ms
        100000  = Operator to System.Array    308931.8 ms
        100000 .Add(..) to List<T>            8.39 ms
  • Related