Home > Enterprise >  Powershell searching for string in file and compare with other file in order to find duplication
Powershell searching for string in file and compare with other file in order to find duplication

Time:03-28

The script below search for string 'Package ID=' in files that located in VIP and ZIP files. Each VIP contains only one vip.manifest that have at least one guid for Package ID ZIP file contains a VIP file. As you can see the content extracted to temp folder and deleted at the end. Now my path contains many VIPs or ZIPs and I need to know if there duplication. if more than one manifest hold the same guid and display the information in which files the duplication. When I run it I can see all the guids from all the ZIPs/VIPs in the path

function checkpackageID([string]$_path)
{
Add-Type -AssemblyName System.IO.Compression, System.IO.Compression.FileSystem

$path = $_path
$tempFolder = Join-Path ([IO.Path]::GetTempPath()) (New-GUID).ToString('n')
$compressedfiles = Get-ChildItem -path $path\* -Include "*.vip","*.zip"

foreach ($file in $compressedfiles) 
{   
    if ($file -like "*.zip")
    {
     try 
     { 
        $zip = [System.IO.Compression.ZipFile]::ExtractToDirectory($file, $tempFolder)
        $test = Get-ChildItem -path $tempFolder\* -Include "*.vip" 
       
        if ($test)
        {
            $zip2 = [System.IO.Compression.ZipFile]::ExtractToDirectory($test, $tempFolder)
            $guidmaps = Get-ChildItem $tempFolder -Include "*.manifest" -Recurse
            write-host    
            foreach($guidmap in $guidmaps) 
            {
               switch -Regex -File($guidmap) { 
               '(?<=<Package ID=")(?<guid>[\d\w-] )"' {
               [pscustomobject]@{
               Guid = $Matches['guid']
               Path = $guidmap.FullName
            }
        }
    }
}
            $guidmap = $guidmap | Group-Object Guid | Where-Object Count -GT 1 | ForEach-Object Group
              
            }

        $guidmap
     }
     catch 
     {
            Write-Warning $_.Exception.Message
            continue
     }
     finally 
     {
               Remove-Item $tempFolder -Force -Recurse
     }
    }
    elseif ($file -like "*.vip") #vip
    {
     try 
     { 
        $zip = [System.IO.Compression.ZipFile]::ExtractToDirectory($file, $tempFolder)
        $guidmaps = Get-ChildItem $tempFolder -Include "*.manifest" -Recurse
        write-host
        foreach($guidmap in $guidmaps) 
        {            
            switch -Regex -File($guidmap) { 
               '(?<=<Package ID=")(?<guid>[\d\w-] )"' {
               [pscustomobject]@{
               Guid = $Matches['guid']
               Path = $guidmap.FullName
            }
        }
    }
}
        $guidmap = $guidmap | Group-Object Guid | Where-Object Count -GT 1 | ForEach-Object Group
        $guidmap  
     }
        
     catch 
     {
            Write-Warning $_.Exception.Message
            continue
     }
     finally 
     {
               Remove-Item $tempFolder -Force -Recurse
     }  
    }
     
    }

} 

CodePudding user response:

Instead of extracting all .manifest files to a folder from your .zip and .vip, you can read the entries directly in memory. Assuming there could be .vip files contained in the .zip, one approach would be to use a recursive function that will search for all the .manifest files. Once all GUIDs have been extracted using the function, the logic using Group-Object would remain the same.

using namespace System.IO
using namespace System.IO.Compression

Add-Type -AssemblyName System.IO.Compression

function Get-ManifestFile {
    [cmdletbinding()]
    param(
        [parameter(ValueFromPipeline, Mandatory)]
        [object] $Path,
        [string] $TargetExtension = '.manifest',
        [string] $Pattern = '(?<=<Package ID=")(?<guid>[\d\w-] )"',
        [Parameter(DontShow)]
        [string] $Parent
    )

    process {

        try {
            if($Path -isnot [FileInfo]) {
                $zip = [ZipArchive]::new($Path.Open())
                $filePath = $Parent
            }
            else {
                $zip = [ZipFile]::OpenRead($Path.FullName)
                $filePath = $Path.FullName
            }

            foreach($entry in $zip.Entries) {
                # if the entry is a `manifest` file, read it
                if([Path]::GetExtension($entry) -eq $TargetExtension) {
                    try {
                        $handle = $entry.Open()
                        $reader = [StreamReader]::new($handle)
                        while(-not $reader.EndOfStream) {
                            if($reader.ReadLine() -match $Pattern) {
                                [pscustomobject]@{
                                    Guid         = $Matches['guid']
                                    FilePath     = $filePath
                                    ZipEntryPath = $entry.FullName
                                }
                            }
                        }
                    }
                    catch { $PSCmdlet.WriteError($_) }
                    finally {
                        ($reader, $handle).ForEach('Dispose')
                    }
                }
                # if the entry is a `.vip` file use recursion
                if([Path]::GetExtension($entry) -eq '.vip') {
                    Get-ManifestFile -Path $entry -Parent $filePath
                }
            }
        }
        catch { $PSCmdlet.WriteError($_) }
        finally {
            ($path, $zip).ForEach('Dispose')
        }
    }
}

$path = "Define Path Here!!!"
$result = Get-ChildItem $path\* -Include '*.vip', '*.zip' |
    Get-ManifestFile | Group-Object Guid | Where-Object Count -GT 1 |
        ForEach-Object Group

if(-not $result) {
    'No duplicates found.'
}
else { $result }
  • Related