Home > database >  PowerShell: Intersection of more than two arrays
PowerShell: Intersection of more than two arrays

Time:06-14

Using PowerShell, I have 14 arrays of strings. Some of the arrays are empty. How would I get the intersection (all elements that exist in all of the arrays) of these arrays (excluding the arrays that are empty)? I am trying to avoid comparing two arrays at a time.

Some of the arrays are empty, so I do not want to include those in my comparisons. Any ideas on how I would approach this? Thank you.

$a = @('hjiejnfnfsd','test','huiwe','test2')
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')

My attempt to solve this (although it does not check for empty arrays):

$overlap = Compare-Object $a $b -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $c -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $d -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $e -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $f -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $g -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $h -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $i -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $j -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $k -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $l -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $m -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $n -PassThru -IncludeEqual -ExcludeDifferent

My desired result is that test and test2 appear in $overlap. This solution does not work because it does not check if the array it is comparing is empty.

CodePudding user response:

Note: The following assumes that no individual array contains the same string more than once (more work would be needed to address that).

$a = @('hjiejnfnfsd','test','huiwe','test2')
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')

$allArrays = $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n

# Initialize a hashtable in which we'll keep
# track of unique strings and how often they occur.
$ht = @{}

# Loop over all arrays.
foreach ($arr in $allArrays) {
  # Loop over each array's elements.
  foreach ($el in $arr) {
    # Add each string and increment its occurrence count.
    $ht[$el]  = 1
  }
}

# Output all strings that occurred in every non-empty array
$ht.GetEnumerator() |
  Where-Object Value -eq ($allArrays | Where-Object Count -gt 0).Count |
  ForEach-Object Key

The above outputs those strings that are present in all of the non-empty input arrays:

test2
test

CodePudding user response:

Here is a solution using a Hashset. A Hashset is a collection that stores only unique items. It has a method IntersectWith which accepts any enumerable type (such as an array) as argument. The method modifies the original Hashset so that it contains only the elements which are contained in both the Hashset and the argument passed to the method.

# Test input
$a = @()     # I changed this to empty array for demonstration purposes
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')

# Create an empty hashset
$overlap = [Collections.Generic.Hashset[object]]::new()

# For each of the arrays...
($a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n).
    Where{ $_.Count -gt 0 }.           #... except the empty ones
    ForEach{
        # If the result Hashset is still empty
        if( $overlap.Count -eq 0 ) {
            # Create the initial hashset from the first non-empty array.
            $overlap = [Collections.Generic.Hashset[object]] $_ 
        }
        else { 
            # Hashset is already initialized, calculate the intersection with next non-empty array.
            $overlap.IntersectWith( $_ )
        }
    }

$overlap  # Output

Output:

test
test2

Remarks:

  • To filter out empty arrays (or in general any kind of collection), we check its Count member, which gives the number of elements.

  • .Foreach and .Where are PowerShell intrinsic methods. These can be faster than the ForEach-Object and Where-Object commands, especially when working directly with collections (as opposed to output of another command). The automatic variable $_ represents the current object, as usual.

  • This code using pipeline commands is functionally the same:

    $overlap = [Collections.Generic.Hashset[object]]::new()
    
    $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n |
        Where-Object Count -gt 0 |           
        ForEach-Object{
            if( $overlap.Count -eq 0 ) {  
                $overlap = [Collections.Generic.Hashset[object]] $_ 
            }
            else { 
                $overlap.IntersectWith( $_ )   
            }
        }
    
  • With the first variant, inserting a linebreak before Where and ForEach is not really necessary, but improves code readability (note that you can't insert a linebreak before .Where and .ForEach, because this confuses the PowerShell parser).

CodePudding user response:

You're close. Excluding empty arrays from comparison is essential because the intersection of an empty array and any other array is an empty array, and once $overlap contains an empty array that will be the final result regardless of what subsequent arrays contain.

Here's your code with the non-empty check and rewritten using loops...

$a = @('hjiejnfnfsd', 'test', 'huiwe', 'test2')
$b = @('test', 'jnfijweofnew', 'test2')
$c = @('njwifqbfiwej', 'test', 'jnfijweofnew', 'test2')
$d = @('bhfeukefwgu', 'test', 'dasdwdv', 'test2', 'hfuweihfei')
$e = @('test', 'ddwadfedgnh', 'test2')
$f = @('test', 'test2')
$g = @('test', 'bjiewbnefw', 'test2')
$h = @('uie287278hfjf', 'test', 'huiwhiwe', 'test2')
$i = @()
$j = @()
$k = @('jireohngi', 'test', 'gu7y8732hbj', 'test2')
$l = @()
$m = @('test', 'test2')
$n = @('test', 'test2')

# Create an array of arrays $a through $n
$arrays = @(
    # 'a'..'n' doesn't work in Windows PowerShell
    # Define both ends of the range...
    #             'a'    → [String]
    #             'a'[0] → [Char]
    #     [Int32] 'a'[0] → 97 (ASCII a)
    # ...and cast each element back to a [Char]
    [Char[]] ([Int32] 'a'[0]..[Int32] 'n'[0]) |
        Get-Variable -ValueOnly
)

# Initialize $overlap to the first non-empty array
for ($initialOverlapIndex = 0; $initialOverlapIndex -lt $arrays.Length; $initialOverlapIndex  )
{
    if ($arrays[$initialOverlapIndex].Length -gt 0)
    {
        break;
    }
}
<#
    Alternative:
        $initialOverlapIndex = [Array]::FindIndex(
            $arrays,
            [Predicate[Array]] { param($array) $array.Length -gt 0 }
        )
#>
$overlap = $arrays[$initialOverlapIndex]

for ($comparisonIndex = $initialOverlapIndex   1; $comparisonIndex -lt $arrays.Length; $comparisonIndex  )
# Alternative: foreach ($array in $arrays | Select-Object -Skip $initialOverlapIndex)
{
    $array = $arrays[$comparisonIndex]
    if ($array.Length -gt 0)
    {
        $overlap = Compare-Object $overlap $array -PassThru -IncludeEqual -ExcludeDifferent
    }
}

$overlap

...which outputs...

test
test2
  • Related