Home > Net >  How to compare CSV file in powershell but exclude some fields of the dataset within the compare
How to compare CSV file in powershell but exclude some fields of the dataset within the compare

Time:01-20

I'm looking for a way to compare two CSV files with powershell and output only the data sets from the first given CSV file which are different. It should also be possible to exclude some fields of the data set (provided via the header field name of the CSV).

CSV example (First CSV file)

FirstName;LastName;LastUpdate;Mail;City;PostalCode
Max;Mustermann;01.01.2023;[email protected];Musterstadt;12345
Maxi;Musterfrau;01.01.2022;[email protected];Musterstadt;12345

CSV example (second CSV file)

FirstName;LastName;LastUpdate;Mail;City;PostalCode
Max;Mustermann;24.12.2023;[email protected];Musterdorf;54321
Maxi;Musterfrau;12.12.2022;[email protected];Musterstadt;12345

As shown in the CSV examples, the first dataset in CSV file 1 and 2 are different. Now, within the compare process, the field 'LastUpdate' should be ignored, so that only the fields 'FirstName;LastName;Mail;City;PostalCode' will be used for comparing the data.

The return of the compare should be only the full datasets which are different and only from file one in a array.

I tried some different things but nothing works as expected. Here a sample of my tries

# Define the file paths
$file1 = "..\file1.csv"
$file2 = "..\file2.csv"

# Read the first file into a variable
$data1 = Import-Csv $file1

# Read the second file into a variable
$data2 = Import-Csv $file2

# Compare the files, ignoring data from the 'LastUpdate' field 
$differences = Compare-Object -ReferenceObject $data1 -DifferenceObject $data2 -IncludeEqual -ExcludeDifferent -Property 'LastUpdate' | Where-Object {$_.SideIndicator -eq '<='} 

# export differences to a CSV file
$differences | Export-Csv -Path "..\Compare_result.csv" -Delimiter ";" -NoTypeInformation

I hope you guys can help me out. I thank you in advance

CodePudding user response:

Compare-Object doesn't allow you to exclude properties to compare by - any properties you want to compare by must be expressed positively, as an array of names passed to -Property.

The purpose of the -ExcludeDifferent switch is to exclude objects that compare differently and only makes sense in combination with -IncludeEqual, given that objects that compare as equal are not included by default (in PowerShell (Core) 7 , use of -ExcludeDifferent now implies -IncludeEqual).

If -Property is used, the [pscustomobject] output objects have only the specified properties. To pass the input objects through as-is, the -PassThru switch must be used.

  • The passed-through objects are decorated with an ETS (Extended Type System) .SideIndicator property, so that filtering based on what side they were unique to is still possible.
  • Caveat: if -IncludeEqual is also present, for a given pair of input objects that compare equal, it is (only) the -ReferenceObject collection's object that is passed through - even though the -DifferenceObject object may have differing values in properties that aren't being compared.

Therefore:

# Compare the files, ignoring data from the 'LastUpdate' field 
$differences = 
  Compare-Object -ReferenceObject $data1 `
                 -DifferenceObject $data2 `
                 -Property FirstName, LastName, Mail, City, PostalCode `
                 -PassThru | 
  Where-Object SideIndicator -eq '<=' 

CodePudding user response:

I found my issue. I forgot the Delimiter param for the two Import-csv, since the Delimiter in my CSV is a ;

So I changed from this code

# Read the first file into a variable
$data1 = Import-Csv $file1

# Read the second file into a variable
$data2 = Import-Csv $file2

to this code

# Read the first file into a variable
$data1 = Import-Csv $file1 -Delimiter ";"

# Read the second file into a variable
$data2 = Import-Csv $file2 -Delimiter ";"

After that I got the expected results from the compare.

  • Related