Home > Net >  Compare-Object where item exists multiple times
Compare-Object where item exists multiple times

Time:07-22

I'm using Compare-Object to compare the contents of two CSV (1 & 2) files which have been imported using Import-Csv. The current layout of CSV 1 is:

title quality
Movie A 4K
Movie A 1080p
Movie A 720p
Movie B 720p

Layout of CSV 2 is:

title quality
Movie A 4K
Movie B 1080p
Movie C 720p
Movie D 720p

In CSV 1, there is going to normally be the same movie a few times due to the differing qualities while in CSV 2, the movie should only ever exist once.

I've somewhat managed to achieve this as when using Compare-Object, it is checking the contents of the CSV and removing duplicates but only ONE duplicate item:

Compare-Object $CSV1 $CSV2 -Property "title" -PassThru | Where-Object{$_.SideIndicator -eq '<='} | Select-Object "title", "quality"

As a result, the output is now:

title quality
Movie A 4K
Movie A 720p
Movie B 720p

I can't seem to figure out how to achieve this, any pointers would be useful and happy to answer questions.

CodePudding user response:

Here is a detailed example of what I think you are trying to accomplish.

$CSV1 = Import-CsV -Path ".\CSV1.csv"
$CSV2 = Import-CsV -Path ".\CSV2.csv"

#Group CSV by title to get any duplicate titles in CSV
$groupedCSV2 = $CSV2 | Group-Object Title


#iterate through CSV1 to make sure all titles are at least in CSV2 one time
#used to report the results
$report = foreach($movieEntry in $CSV1)
{
    #is movie Entry in CSV2 ?
    if($movieEntry.title -in $groupedCSV2.Name)
    {        
        $object = New-Object PSObject -Property @{
            Title = $movieEntry.title
            #how many are in csv2 ?
            Count = ($groupedCSV2| Where-Object {$_.Name -eq $movieEntry.title}).Count
            #quality in CSV2
            Quality = ($groupedCSV2 | Where-Object {$_.Name -eq $movieEntry.title}).Group.quality -join ","
        }
        #add object to $report
        $object
        
    }
    #not in CSV 2
    else
    {        
        $object = New-Object PSObject -Property @{
            Title = $movieEntry.title
            #how many are in csv2 ?
            Count = 0
            Quality = $null
        }       
        #add object to $report
        $object
    }
}


#remove all duplicates from report
$report = $report | Select -Unique Title, Quality, Count

#### EXAMPLES ####
#to view all results call $report
$report
#to view all that are not in CSV2
$report | Where-Object {$_.count -eq 0}
#view all duplicates
$report | Where-Object {$_.count -gt 1}
#view all good entires
$repot | Where-Object {$_.count -eq 1}

CodePudding user response:

If you only want to know what titles in CSV1 aren't also in CSV2, irrespective of how often they appear in CSV1, and irrespective of their quality level:

Compare-Object ($CSV1.title | Select-Object -Unique) $CSV2.title -PassThru |
  Where-Object SideIndicator -eq '<='

Select-Object -Unique ensures that only distinct titles (no duplicates) are used in the comparison; calling it for $CSV2.title too isn't necessary, given your premise that the titles are unique there.

The reason this is necessary is that Compare-Object does not perform set comparison; duplicates are matched individually against the other collection; thus, for instance, if the first among several duplicates is found in the other collection, but the others are not, these others are still reported as difference, as the following simplified example shows:

Compare-Object (1, 2, 1) (1 , 2)

Output:

InputObject SideIndicator
----------- -------------
          1 <=

That is, the second 1 value in the first collection was reported as a difference, because it had no counterpart in the second collection (after the first 1 was matched and eliminated from the comparison).

  • Related