Home > Enterprise >  Filter non-duplicates from CSV with PowerShell
Filter non-duplicates from CSV with PowerShell

Time:10-14

I want to compare values in two csv files and return any entries from source2 that do NOT match entries in source1 while disregarding any duplicate entries. Below is my attempt, but it does not return all entries. What would be the best way to get this script to do what I need?

$AD = Import-CSV -Path "source1.csv"
$Student = Import-CSV -Path "source2.csv"

$AD |OuterJoin-Object $Student -on UserPrincipalName | Export-CSV -Path "Path.csv"

Source1 and Source2 csv have columns "Name", "UserPrincipalName", and "TeamDesc". I want to use those to match entries. Ideally input/output would look like this:

Source1.csv
| TeamDesc | UserPrincipalName   |   Name      |
|:---------|:--------------------|:------------|
| Team 1   | [email protected] | john smith  |
| Team 1   | [email protected] | nancy drew  |
| Team 2   | [email protected] | harvey dent |

Source2.csv
| TeamDesc |  UserPrincipalName  |   Name      |
|:---------|:--------------------|:------------|
| Team 1   | [email protected] | john smith  |
| Team 2   | [email protected] | harvey dent |

Export.csv
| TeamDesc | UserPrincipalName   |  Name      |
|:---------|:--------------------|:-----------|
| Team 1   | [email protected] | nancy drew |

CodePudding user response:

Unsure how it's done with OuterJoin-Object. I assume you wanted to do this:

$AD = Import-Csv source1.csv | Group-Object UserPrincipalName -AsHashTable -AsString
$Student = Import-CSV -Path source2.csv

@(
    $AD.Values.ForEach{ $_ }
    $Student.Where{ -not $AD.ContainsKey($_.UserPrincipalName) }
) | Export-CSV -Path Path.csv -NoTypeInformation

If you want to exclude possible duplicates coming from source2.csv, you can use this:

@(
    $AD.Values.ForEach{ $_ }
    $Student.Where{ -not $AD.ContainsKey($_.UserPrincipalName) }.
        ForEach{ $AD[$_.UserPrincipalName] = $null }
) | Export-CSV -Path Path.csv -NoTypeInformation

Looking now at your, now edited answer which provides an expected output, it seems what you actually wanted was:

$set = [System.Collections.Generic.HashSet[string]]::new(
    [string[]] (Import-CSV -Path stundent.csv).UserPrincipalName,
    [System.StringComparer]::InvariantCultureIgnoreCase
)
Import-Csv ad.csv | Where-Object { $set.Add($_.UserPrincipalName) } |
    Export-Csv path\to\output.csv -NoTypeInformation

CodePudding user response:

$Source1 = ConvertFrom-Csv @'
TeamDesc, UserPrincipalName,   Name
"Team 1", [email protected], "john smith"
"Team 1", [email protected], "nancy drew"
"Team 2", [email protected], "harvey dent"
'@

$Source1 = ConvertFrom-Csv @'
TeamDesc, UserPrincipalName,   Name
"Team 1", [email protected], "john smith"
"Team 2", [email protected], "harvey dent"
'@
$Source1 |OuterJoin $Source2 -On Name,TeamDesc

TeamDesc UserPrincipalName            Name
-------- -----------------            ----
Team 1   {[email protected], $null} nancy drew

Note that the UserPrincipalName has two values the one from the left table and the (empty) one from the right table. This will show you on with side the information is missing. You might also further distinguish the properties with the -Name parameter e.g.:

$Source1 |OuterJoin $Source2 -On Name,TeamDesc -Name Source1,Source2

TeamDesc Source1UserPrincipalName Source2UserPrincipalName Name
-------- ------------------------ ------------------------ ----
Team 1   [email protected]                               nancy drew

Or just the left properties only:

$Source1 |OuterJoin $Source2 -On Name,TeamDesc -Property 'Left.*'

TeamDesc UserPrincipalName   Name
-------- -----------------   ----
Team 1   [email protected] nancy drew
  • Related