Home > Blockchain >  PowerShell - Merge CSVs
PowerShell - Merge CSVs

Time:09-06

I created a script which gets Name, CreationTime and Duration from each .mp4 file in a specific directory and than exports the data into a CSV file using the Cmdlet Export-Csv.

Now I have 3 CSV exports and I want to merge them, so I tried this:

$Data = @()
$CSVsPaths | ForEach-Object {
    $Data  = Import-Csv -Path "$_" -Encoding "UTF8"
}
$Data

But for some reason, some of the objects are duplicated, and I'm sure the exports all contains different data. What am I doing wrong?


Edit:

Here are the CSVs: https://drive.google.com/drive/folders/1MbeUenLxbKdlle6rKFwc3jJNZf85AMtd

CodePudding user response:

As mentioned in the comments, your output has duplicate rows because duplicates already exist in your data set.

To locate the duplicated rows, use Get-Content against the input files - the file system provider will attach some hidden properties to the output that we can use to identify the duplicate location(s) later:

# find all non-unique strings in the input files
$nonUniques = Get-Content '.\Export*.csv' |Group-Object |Where-Object Count -gt 1 |ForEach-Object Group 

# use the PSChildName and ReadCount provider properties to identify the files that host the duplicate content,
# then use Format-Table to show output nicely grouped on the non-unique string value
$nonUniques |Select @{Name='Name';Expression='PSChildName'},@{Name='Line';Expression='ReadCount'},@{Name='Duplicate';Expression={$_}} |Format-Table Name,Line -GroupBy Duplicate

Which, given the input data you linked, will produce something like this:

   Duplicate: "QVR_06082021_141022 (PRIMA VOLTA CHE REGISTRO).mp4","06/08/2021 14:10:22","00:00:46"

Name                                      Line
----                                      ----
Export 2 (da 24-06-2022 a 31-07-2022).csv  113
Export 3 (da 06-08-2021 a 31-08-2021).csv    3


   Duplicate: "QVR_06082021_142308.mp4","06/08/2021 14:23:08","00:00:50"

Name                                      Line
----                                      ----
Export 2 (da 24-06-2022 a 31-07-2022).csv  114
Export 3 (da 06-08-2021 a 31-08-2021).csv    4


   Duplicate: "VID_20210806_220220.mp4","06/08/2021 22:02:20","00:00:20"

Name                                      Line
----                                      ----
Export 2 (da 24-06-2022 a 31-07-2022).csv  115
Export 3 (da 06-08-2021 a 31-08-2021).csv    5
  • Related