Home > Mobile >  Recreating a linux md5 checksum on Windows
Recreating a linux md5 checksum on Windows

Time:09-17

I'm performing some pretty straightforward checksums on our linux boxes, but I now need to recreate something similar for our windows users. To give me a single checksum, I just run:

md5sum *.txt | awk '{ print $1 }' | md5sum

I'm struggling to recreate this in Windows, either with a batch file or Powershell. The closest I've got is:

Get-ChildItem $path -Filter *.txt | 
Foreach-Object {
   $hash =  Get-FileHash -Algorithm MD5 -Path ($path   "\"   $_) | Select -ExpandProperty "Hash"
   $hash = $hash.tolower()  #Get-FileHash returns checksums in uppercase, linux in lower case (!)
   Write-host $hash
}

This will print the same checksum results for each file to the console as the linux command, but piping that back to Get-FileHash to get a single output that matches the linux equivalent is eluding me. Writing to a file gets me stuck with carriage return differences

Streaming as a string back to Get-FileHash doesn't return the same checksum:

$String = Get-FileHash -Algorithm MD5 -Path (Get-ChildItem -path $files -Recurse) | Select -ExpandProperty "Hash"
$stringAsStream = [System.IO.MemoryStream]::new()
$writer = [System.IO.StreamWriter]::new($stringAsStream)
$writer.write($stringAsStream)
Get-FileHash -Algorithm MD5 -InputStream $stringAsStream

Am I over-engineering this? I'm sure this shouldn't be this complicated! TIA

CodePudding user response:

You need to reference the .Hash property on the returned object from Get-FileHash. If you want a similar view to md5hash, you can also use Select-Object to curate this:

# Get filehashes in $path with similar output to md5sum
$fileHashes = Get-ChildItem $path -File | Get-FileHash -Algorithm MD5

# Once you have the hashes, you can reference the properties as follows
# .Algorithm is the hashing algo
# .Hash is the actual file hash
# .Path is the full path to the file
foreach( $hash in $fileHashes ){
  "$($hash.Algorithm):$($hash.Hash) ($($hash.Path))"
}

For each file in $path, the above foreach loop will produce a line that similar to:

MD5:B4976887F256A26B59A9D97656BF2078 (C:\Users\username\dl\installer.msi)

The algorithm, hash, and filenames will obviously differ based on your selected hashing algorithm and filesystem.

CodePudding user response:

The devil is in the details:

  • (known already) Get-FileHash returns checksums in uppercase while Linux md5sum in lower case (!);
  • The FileSystem provider's filter *.txt is not case sensitive in PowerShell while in Linux depends on the option nocaseglob. If set (shopt -s nocaseglob) then Bash matches filenames in a case-insensitive fashion when performing filename expansion. Otherwise (shopt -u nocaseglob), filename matching is case-sensitive;
  • Order: Get-ChildItem output is ordered according to Unicode collation algorithm while in Linux *.txt filter is expanded in order of LC_COLLATE category (LC_COLLATE="C.UTF-8" on my system).

In the following (partially commented) script, three # Test blocks demonstrate my debugging steps to the final solution:

Function Get-StringHash {
    [OutputType([System.String])]
    param(
        # named or positional: a string
        [Parameter(Position=0)]
        [string]$InputObject
    )
    $stringAsStream = [System.IO.MemoryStream]::new()
    $writer = [System.IO.StreamWriter]::new($stringAsStream)
    $writer.write( $InputObject)
    $writer.Flush()
    $stringAsStream.Position = 0
    Get-FileHash -Algorithm MD5 -InputStream $stringAsStream |
        Select-Object -ExpandProperty Hash
    $writer.Close()
    $writer.Dispose()
    $stringAsStream.Close()
    $stringAsStream.Dispose()
}

function ConvertTo-Utf8String {
    [OutputType([System.String])]
    param(
        # named or positional: a string
        [Parameter(Position=0, Mandatory = $false)]
        [string]$InputObject = ''
    )
    begin {
        $InChars  = [char[]]$InputObject
        $InChLen  = $InChars.Count
        $AuxU_8 = [System.Collections.ArrayList]::new()
    }
    process {
        for ($ii= 0; $ii -lt $InChLen; $ii  ) {
            if ( [char]::IsHighSurrogate( $InChars[$ii]) -and
                    ( 1   $ii) -lt  $InChLen             -and
                    [char]::IsLowSurrogate( $InChars[1   $ii]) ) {
                $s = [char]::ConvertFromUtf32(
                     [char]::ConvertToUtf32( $InChars[$ii], $InChars[1   $ii]))
                $ii   
            } else {
                $s = $InChars[$ii]
            }
            [void]$AuxU_8.Add( 
                ([System.Text.UTF32Encoding]::UTF8.GetBytes($s) | 
                    ForEach-Object { '{0:X2}' -f $_}) -join ''
            )
        }
    }
    end { $AuxU_8 -join '' }
}

# Set variables
$hashUbuntu = '5d944e44149fece685d3eb71fb94e71b'
$hashUbuntu   <# copied from 'Ubuntu 20.04 LTS' in Wsl2:
              cd `wslpath -a 'D:\\bat'`
              md5sum *.txt | awk '{ print $1 }' | md5sum | awk '{ print $1 }'
              <##>
$LF = [char]0x0A   # Line Feed (LF)
$path = 'D:\Bat'   # testing directory


$filenames = 'D:\bat\md5sum_Ubuntu_awk.lst'
<# obtained from 'Ubuntu 20.04 LTS' in Wsl2:
    cd `wslpath -a 'D:\\bat'`
    md5sum *.txt | awk '{ print $1 }' > md5sum_Ubuntu_awk.lst
    md5sum md5sum_Ubuntu_awk.lst | awk '{ print $1 }' # for reference
<##>

# Test #1: is `Get-FileHash` the same (beyond character case)?
$hashFile = Get-FileHash -Algorithm MD5 -Path $filenames |
                Select-Object -ExpandProperty Hash
$hashFile.ToLower() -ceq $hashUbuntu

# Test #2: is `$stringToHash` well-defined? is `Get-StringHash` the same?
$hashArray = Get-Content $filenames -Encoding UTF8
$stringToHash = ($hashArray -join $LF)   $LF
(Get-StringHash -InputObject $stringToHash) -eq $hashUbuntu 

# Test #3: another check: is `Get-StringHash` the same?
Push-Location -Path $path
$filesInBashOrder = bash.exe -c "ls -1 *.txt"
$hashArray = $filesInBashOrder |
    Foreach-Object {
        $hash = Get-FileHash -Algorithm MD5 -Path (
                        Join-Path -Path $path -ChildPath $_) |
                    Select-Object -ExpandProperty "Hash"
        $hash.tolower()
    }
$stringToHash = ($hashArray -join $LF)   $LF
(Get-StringHash -InputObject $stringToHash) -eq $hashUbuntu
Pop-Location

# Solution - ordinal order assuming `LC_COLLATE="C.UTF-8"` in Linux
Push-Location -Path $path
$hashArray = Get-ChildItem -Filter *.txt -Force -ErrorAction SilentlyContinue |
    Where-Object {$_.Name -clike "*.txt"} | # only if `shopt -u nocaseglob`
    Sort-Object -Property { (ConvertTo-Utf8String -InputObject $_.Name) } |
    Get-FileHash -Algorithm MD5 |
        Select-Object -ExpandProperty "Hash" |
    Foreach-Object {
        $_.ToLower()
    }
$stringToHash = ($hashArray -join $LF)   $LF
(Get-StringHash -InputObject $stringToHash).ToLower() -ceq $hashUbuntu
Pop-Location

Output (tested on 278 files): .\SO\69181414.ps1

5d944e44149fece685d3eb71fb94e71b
True
True
True
True
  • Related