I created a script which gets Name
, CreationTime
and Duration
from each .mp4 file in a specific directory and than exports the data into a CSV file using the Cmdlet Export-Csv
.
Now I have 3 CSV exports and I want to merge them, so I tried this:
$Data = @()
$CSVsPaths | ForEach-Object {
$Data = Import-Csv -Path "$_" -Encoding "UTF8"
}
$Data
But for some reason, some of the objects are duplicated, and I'm sure the exports all contains different data. What am I doing wrong?
Edit:
Here are the CSVs: https://drive.google.com/drive/folders/1MbeUenLxbKdlle6rKFwc3jJNZf85AMtd
CodePudding user response:
As mentioned in the comments, your output has duplicate rows because duplicates already exist in your data set.
To locate the duplicated rows, use Get-Content
against the input files - the file system provider will attach some hidden properties to the output that we can use to identify the duplicate location(s) later:
# find all non-unique strings in the input files
$nonUniques = Get-Content '.\Export*.csv' |Group-Object |Where-Object Count -gt 1 |ForEach-Object Group
# use the PSChildName and ReadCount provider properties to identify the files that host the duplicate content,
# then use Format-Table to show output nicely grouped on the non-unique string value
$nonUniques |Select @{Name='Name';Expression='PSChildName'},@{Name='Line';Expression='ReadCount'},@{Name='Duplicate';Expression={$_}} |Format-Table Name,Line -GroupBy Duplicate
Which, given the input data you linked, will produce something like this:
Duplicate: "QVR_06082021_141022 (PRIMA VOLTA CHE REGISTRO).mp4","06/08/2021 14:10:22","00:00:46"
Name Line
---- ----
Export 2 (da 24-06-2022 a 31-07-2022).csv 113
Export 3 (da 06-08-2021 a 31-08-2021).csv 3
Duplicate: "QVR_06082021_142308.mp4","06/08/2021 14:23:08","00:00:50"
Name Line
---- ----
Export 2 (da 24-06-2022 a 31-07-2022).csv 114
Export 3 (da 06-08-2021 a 31-08-2021).csv 4
Duplicate: "VID_20210806_220220.mp4","06/08/2021 22:02:20","00:00:20"
Name Line
---- ----
Export 2 (da 24-06-2022 a 31-07-2022).csv 115
Export 3 (da 06-08-2021 a 31-08-2021).csv 5