I got this script and modified it a bit (to avoid extract the same file to one temp file). I have two issues:
- When the script founds duplication, the SourchArchive always shows one file (instead of 2 that holds the same file inside)
- When a compressed file holds more than 1 same file in a different subfolder (in the same zip) the script return that there is duplication and its not good for me. If the compressed file has 3 files that are the same it should combined to 1 file and then compering it to another compressed file
Update:
The main goal is to compare between compressed files in order to find duplicate files inside the compressed files. The compressed files can be cab or zip (The zip could contains dlls, xml, msi and more. sometimes it contains also a vip files (vip is a compressed file that also contains files like dll)) After compering each compressed file in another the output should be the compressed files that holds the same files inside It will be great to separate between the result with ----------
this should be as part of a bigger script that should stop if there are duplicate files in more than 1 compressed file so only if $MatchedSourceFiles has result the script will stop otherwise should continue. I hope its clear now
Example:
test1.zip contains temp.xml
test2.zip contains temp.xml
The output should be:
SourceArchive DuplicateFile
test1.zip temp.xml
test2.zip temp.xml
------------------------------
The next duplication files
------------------------------
Example 2: (multiple identical files in the same compressed file)
test1.zip contains 2 subfolders
test1.zip contains temp.xml under subfolder1 and also temp.xml under subfolder2
The result should be none
SourceArchive DuplicateFile
Example 3:
test1.zip same as in example 2
test3.zip contains temp.xml
The result should be:
SourceArchive DuplicateFile
test1.zip temp.xml
test3.zip temp.xml
------------------------------
The next duplication files
------------------------------
The next duplication files
------------------------------
Add-Type -AssemblyName System.IO.Compression.FileSystem
$tempFolder = Join-Path -Path ([IO.Path]::GetTempPath()) -ChildPath (New-GUID).Guid
$compressedfiles = Get-ChildItem -Path 'C:\Intel' -Include '*.zip', '*.CAB' -File -Recurse
$MatchedSourceFiles = foreach ($file in $compressedfiles) {
switch ($file.Extension) {
'.zip' {
$t = $tempFolder "\" $file.Name
# the destination folder should NOT already exist here
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($file.FullName, $t )
try {
Get-ChildItem -Path $tempFolder -Filter '*.vip' -File -Recurse | ForEach-Object {
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($_.FullName, $t)
}
}
catch {}
}
'.cab' {
# the destination folder MUST exist for expanding .cab files
$null = New-Item -Path $tempFolder -ItemType Directory -Force
expand.exe $file.FullName -F:* $tempFolder > $null
}
}
# now see if there are files with duplicate names
Get-ChildItem -Path $tempFolder -File -Recurse -Exclude vip.manifest, filesSources.txt, *.vip | Group-Object Name |
Where-Object { $_.Count -gt 1 } | ForEach-Object {
foreach ($item in $_.Group) {
# output objects to be collected in $MatchedSourceFiles
[PsCustomObject]@{
SourceArchive = $file.FullName
DuplicateFile = '.{0}' -f $item.FullName.Substring($tempFolder.Length) # relative path
}
}
}
}
# display on screen
$MatchedSourceFiles
$tempFolder | Remove-Item -Force -Recurse
CodePudding user response:
Thanks for the examples. Using these, I changed my previous code to this:
Add-Type -AssemblyName System.IO.Compression.FileSystem
$tempFolder = Join-Path -Path ([IO.Path]::GetTempPath()) -ChildPath (New-GUID).Guid
$compressedfiles = Get-ChildItem -Path 'C:\Intel' -Include '*.zip','*.CAB' -File -Recurse
$MatchedSourceFiles = foreach ($file in $compressedfiles) {
switch ($file.Extension) {
'.zip' {
# the destination folder should NOT already exist here
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($file.FullName, $tempFolder)
# prepare a subfolder name for .vip files
$subTemp = Join-Path -Path $tempFolder -ChildPath ([datetime]::Now.Ticks)
Get-ChildItem -Path $tempFolder -Filter '*.vip' -File -Recurse | ForEach-Object {
$null = [System.IO.Compression.ZipFile]::ExtractToDirectory($_.FullName, $subTemp)
}
}
'.cab' {
# the destination folder MUST exist for expanding .cab files
$null = New-Item -Path $tempFolder -ItemType Directory -Force
expand.exe $file.FullName -F:* $tempFolder > $null
}
}
# output objects for each unique file name in the extracted folder to collect in $MatchedSourceFiles
Get-ChildItem -Path $tempFolder -File -Recurse |
Select-Object @{Name = 'SourceArchive'; Expression = {$file.FullName}},
@{Name = 'FileName'; Expression = {$_.Name}} -Unique
# delete the temporary folder
$tempFolder | Remove-Item -Force -Recurse
}
# at this point $MatchedSourceFiles contains all (unique) filenames from all .zip and/or .cab files
# now see if there are files with duplicate names between the archive files
$result = $MatchedSourceFiles | Group-Object FileName | Where-Object { $_.Count -gt 1 } | ForEach-Object {$_.Group}
# display on screen
$result
# save as CSV file
$result | Export-Csv -Path 'X:\DuplicateFiles.csv' -UseCulture -NoTypeInformation
The output would be:
Example 1:
SourceArchive FileName
------------- --------
C:\Intel\test1.zip temp.xml
C:\Intel\test2.zip temp.xml
Example 2:
no output
Example 3:
SourceArchive FileName
------------- --------
C:\Intel\test1.zip temp.xml
C:\Intel\test3.zip temp.xml