Home > front end >  Powershell script to compare two directories (including sub directories and contents) that are suppo
Powershell script to compare two directories (including sub directories and contents) that are suppo

Time:11-11

I would like to run a powershell script that can be supplied a directory name by the user and then it will check the directory, sub directories, and all file contents of those directories to compare if they are identical to each other. There are 8 servers that should all have identical files and contents. The below code does not appear to be doing what I intended. I have seen the use of Compare-Object, Get-ChildItem, and Get-FileHash but have not found the right combo that I am certain is actually accomplishing the task. Any and all help is appreciated!

$35 = "\\server1\"
$36 = "\\server2\"
$37 = "\\server3\"
$38 = "\\server4\"
$45 = "\\server5\"
$46 = "\\server6\"
$47 = "\\server7\"
$48 = "\\server8\"
do{
Write-Host "|1 : New   |"
Write-Host "|2 : Repeat|"
Write-Host "|3 : Exit  |"
$choice = Read-Host -Prompt "Please make a selection"
    switch ($choice){
        1{
            $App = Read-Host -Prompt "Input Directory Application"
        }
        2{
            #rerun
        }
    3{
        exit;       }
    }

$c35 = $35   "$App"  "\*"
$c36 = $36   "$App"  "\*"
$c37 = $37   "$App"  "\*"
$c38 = $38   "$App"  "\*"
$c45 = $45   "$App"  "\*"
$c46 = $46   "$App"  "\*"
$c47 = $47   "$App"  "\*"
$c48 = $48   "$App"  "\*"

Write-Host "Comparing Server1 -> Server2"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c36 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server3"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c37 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server4"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c38 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server5"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c45 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server6"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c46 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server7"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c47 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server8"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c48 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

} until ($choice -eq 3)

CodePudding user response:

Here is an example function that tries to compare one reference directory against multiple difference directories efficiently. It does so by comparing the most easily available informations first and stopping at the first difference.

  • Get all relevant informations about files in reference directory once, including hashes (though this could be more optimized by getting hashes only if necessary).
  • For each difference directory, compare in this order:
    • file count - if different, then obviously directories are different
    • relative file paths - if not all paths from difference directory can be found in reference directory, then directories are different
    • file sizes - should be obvious
    • file hashes - hashes only need to be calculated if files have equal size
Function Compare-MultipleDirectories {
    param(
        [Parameter(Mandatory)] [string] $ReferencePath,
        [Parameter(Mandatory)] [string[]] $DifferencePath
    )

    # Get basic file information recursively by calling Get-ChildItem with the addition of the relative file path
    Function Get-ChildItemRelative {
        param( [Parameter(Mandatory)] [string] $Path )

        Push-Location $Path  # Base path for Get-ChildItem and Resolve-Path
        try { 
            Get-ChildItem -File -Recurse | Select-Object FullName, Length, @{ n = 'RelativePath'; e = { Resolve-Path $_.FullName -Relative } }
        } finally { 
            Pop-Location 
        }
    }

    Write-Verbose "Reading reference directory '$ReferencePath'"

    # Create hashtable with all infos of reference directory
    $refFiles = Get-ChildItemRelative $ReferencePath |
        Select-Object *, @{ n = 'Hash'; e = { (Get-FileHash $_.FullName -Algorithm MD5).Hash } } | 
        Group-Object RelativePath -AsHashTable 

    # Compare content of each directory of $DifferencePath with $ReferencePath
    foreach( $diffPath in $DifferencePath ) {
        Write-Verbose "Comparing directory '$diffPath' with '$ReferencePath'"
        
        $areDirectoriesEqual = $false
        $differenceType = ''

        $diffFiles = Get-ChildItemRelative $diffPath

        # Directories must have same number of files
        if( $diffFiles.Count -eq $refFiles.Count ) {

            # Find first different path (if any)
            $firstDifferentPath = $diffFiles | Where-Object { -not $refFiles.ContainsKey( $_.RelativePath ) } | 
                                  Select-Object -First 1

            if( -not $firstDifferentPath ) {

                # Find first different content (if any) by file size comparison
                $firstDifferentFileSize = $diffFiles |
                    Where-Object { $refFiles[ $_.RelativePath ].Length -ne $_.Length } |
                    Select-Object -First 1

                if( -not $firstDifferentFileSize ) {

                    # Find first different content (if any) by hash comparison
                    $firstDifferentContent = $diffFiles | 
                        Where-Object { $refFiles[ $_.RelativePath ].Hash -ne (Get-FileHash $_.FullName -Algorithm MD5).Hash } | 
                        Select-Object -First 1
                
                    if( -not $firstDifferentContent ) {
                        $areDirectoriesEqual = $true
                    }
                    else {
                        $differenceType = 'Content'
                    } 
                }
                else {
                    $differenceType = 'FileSize'
                }
            }
            else {
                $differenceType = 'Path'
            }
        }
        else {
            $differenceType = 'FileCount'
        }

        # Output comparison result
        [PSCustomObject]@{ 
            ReferencePath = $ReferencePath  
            DifferencePath = $diffPath  
            Equal = $areDirectoriesEqual  
            DiffCause = $differenceType 
        }
    }
}

Usage example:

# compare each of directories B, C, D, E, F against A
Compare-MultipleDirectories -ReferencePath 'A' -DifferencePath 'B', 'C', 'D', 'E', 'F' -Verbose

Output example:

ReferencePath DifferencePath Equal DiffCause
------------- -------------- ----- ---------
A             B               True 
A             C              False FileCount
A             D              False Path     
A             E              False FileSize 
A             F              False Content 

DiffCause column gives you the information why the function thinks the directories are different.

Note:

  • Select-Object -First 1 is a neat trick to stop searching after we got the first result. It is efficient because it doesn't process all input first and drop everything except first item, but instead it actually cancels the pipeline after the 1st item has been found.
  • Group-Object RelativePath -AsHashTable creates a hashtable of the file information so it can be looked up quickly by the RelativePath property.
  • Empty sub directories are ignored, because the function only looks at files. E. g. if reference path contains some empty directories but difference path does not, and the files in all other directories are equal, the function treats the directories as equal.
  • I've choosen MD5 algorithm because it is faster than the default SHA-256 algorithm used by Get-FileHash, but it is insecure. Someone could easily manipulate a file that is different, to have the same MD5 hash as the original file. In a trusted environment this won't matter though. Remove -Algorithm MD5 if you need more secure comparison.
  • Related