Using PowerShell, I have 14 arrays of strings. Some of the arrays are empty. How would I get the intersection (all elements that exist in all of the arrays) of these arrays (excluding the arrays that are empty)? I am trying to avoid comparing two arrays at a time.
Some of the arrays are empty, so I do not want to include those in my comparisons. Any ideas on how I would approach this? Thank you.
$a = @('hjiejnfnfsd','test','huiwe','test2')
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')
My attempt to solve this (although it does not check for empty arrays):
$overlap = Compare-Object $a $b -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $c -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $d -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $e -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $f -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $g -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $h -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $i -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $j -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $k -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $l -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $m -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $n -PassThru -IncludeEqual -ExcludeDifferent
My desired result is that test and test2 appear in $overlap. This solution does not work because it does not check if the array it is comparing is empty.
CodePudding user response:
Note: The following assumes that no individual array contains the same string more than once (more work would be needed to address that).
$a = @('hjiejnfnfsd','test','huiwe','test2')
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')
$allArrays = $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n
# Initialize a hashtable in which we'll keep
# track of unique strings and how often they occur.
$ht = @{}
# Loop over all arrays.
foreach ($arr in $allArrays) {
# Loop over each array's elements.
foreach ($el in $arr) {
# Add each string and increment its occurrence count.
$ht[$el] = 1
}
}
# Output all strings that occurred in every non-empty array
$ht.GetEnumerator() |
Where-Object Value -eq ($allArrays | Where-Object Count -gt 0).Count |
ForEach-Object Key
The above outputs those strings that are present in all of the non-empty input arrays:
test2
test
CodePudding user response:
Here is a solution using a Hashset
. A Hashset
is a collection that stores only unique items. It has a method IntersectWith
which accepts any enumerable type (such as an array) as argument. The method modifies the original Hashset
so that it contains only the elements which are contained in both the Hashset
and the argument passed to the method.
# Test input
$a = @() # I changed this to empty array for demonstration purposes
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')
# Create an empty hashset
$overlap = [Collections.Generic.Hashset[object]]::new()
# For each of the arrays...
($a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n).
Where{ $_.Count -gt 0 }. #... except the empty ones
ForEach{
# If the result Hashset is still empty
if( $overlap.Count -eq 0 ) {
# Create the initial hashset from the first non-empty array.
$overlap = [Collections.Generic.Hashset[object]] $_
}
else {
# Hashset is already initialized, calculate the intersection with next non-empty array.
$overlap.IntersectWith( $_ )
}
}
$overlap # Output
Output:
test
test2
Remarks:
To filter out empty arrays (or in general any kind of collection), we check its
Count
member, which gives the number of elements..Foreach
and.Where
are PowerShell intrinsic methods. These can be faster than theForEach-Object
andWhere-Object
commands, especially when working directly with collections (as opposed to output of another command). The automatic variable$_
represents the current object, as usual.This code using pipeline commands is functionally the same:
$overlap = [Collections.Generic.Hashset[object]]::new() $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n | Where-Object Count -gt 0 | ForEach-Object{ if( $overlap.Count -eq 0 ) { $overlap = [Collections.Generic.Hashset[object]] $_ } else { $overlap.IntersectWith( $_ ) } }
With the first variant, inserting a linebreak before
Where
andForEach
is not really necessary, but improves code readability (note that you can't insert a linebreak before.Where
and.ForEach
, because this confuses the PowerShell parser).
CodePudding user response:
You're close. Excluding empty arrays from comparison is essential because the intersection of an empty array and any other array is an empty array, and once $overlap
contains an empty array that will be the final result regardless of what subsequent arrays contain.
Here's your code with the non-empty check and rewritten using loops...
$a = @('hjiejnfnfsd', 'test', 'huiwe', 'test2')
$b = @('test', 'jnfijweofnew', 'test2')
$c = @('njwifqbfiwej', 'test', 'jnfijweofnew', 'test2')
$d = @('bhfeukefwgu', 'test', 'dasdwdv', 'test2', 'hfuweihfei')
$e = @('test', 'ddwadfedgnh', 'test2')
$f = @('test', 'test2')
$g = @('test', 'bjiewbnefw', 'test2')
$h = @('uie287278hfjf', 'test', 'huiwhiwe', 'test2')
$i = @()
$j = @()
$k = @('jireohngi', 'test', 'gu7y8732hbj', 'test2')
$l = @()
$m = @('test', 'test2')
$n = @('test', 'test2')
# Create an array of arrays $a through $n
$arrays = @(
# 'a'..'n' doesn't work in Windows PowerShell
# Define both ends of the range...
# 'a' → [String]
# 'a'[0] → [Char]
# [Int32] 'a'[0] → 97 (ASCII a)
# ...and cast each element back to a [Char]
[Char[]] ([Int32] 'a'[0]..[Int32] 'n'[0]) |
Get-Variable -ValueOnly
)
# Initialize $overlap to the first non-empty array
for ($initialOverlapIndex = 0; $initialOverlapIndex -lt $arrays.Length; $initialOverlapIndex )
{
if ($arrays[$initialOverlapIndex].Length -gt 0)
{
break;
}
}
<#
Alternative:
$initialOverlapIndex = [Array]::FindIndex(
$arrays,
[Predicate[Array]] { param($array) $array.Length -gt 0 }
)
#>
$overlap = $arrays[$initialOverlapIndex]
for ($comparisonIndex = $initialOverlapIndex 1; $comparisonIndex -lt $arrays.Length; $comparisonIndex )
# Alternative: foreach ($array in $arrays | Select-Object -Skip $initialOverlapIndex)
{
$array = $arrays[$comparisonIndex]
if ($array.Length -gt 0)
{
$overlap = Compare-Object $overlap $array -PassThru -IncludeEqual -ExcludeDifferent
}
}
$overlap
...which outputs...
test
test2