Home > Net >  Stream just part of a file using PowerShell and compute hash
Stream just part of a file using PowerShell and compute hash

Time:12-04

I need to be able to identify some large binary files which have been copied and renamed between secure servers. To do this, I would like to be able to hash the first X bytes and the last X bytes of all the files. I need to do this with only what is available on a standard Windows 10 system with no additional software installed, so PowerShell seems like the right choice.

Some things that don't work:

  • I cannot read the entire file in, then extract the parts of the file I want to hash. The objective I'm trying to achieve is to minimize the amount of the file I need to read, and reading the entire file defeats that purpose.
  • Reading moderately large portions of a file into a PowerShell variable appears to be pretty slow, so $hash.ComputeHash($moderatelyLargeVariable) doesn't seem like a viable solution.

I'm pretty sure I need to do $hash.ComputeHash($stream) where $stream only streams part of the file.

Thus far I've tried:

function Get-FileStreamHash {
    param (
        $FilePath,
        $Algorithm
    )

    $hash = [Security.Cryptography.HashAlgorithm]::Create($Algorithm)

    ## METHOD 0: See description below
    $stream = ([IO.StreamReader]"${FilePath}").BaseStream
    $hashValue = $hash.ComputeHash($stream)
    ## END of part I need help with

    # Convert to a hexadecimal string
    $hexHashValue = -join ($hashValue | ForEach-Object { "{0:x2}" -f $_ })
    $stream.Close()

    # return
    $hexHashValue
}

Method 0: This works, but it's streaming the whole file and thus doesn't solve my problem. For a 3GB file this takes about 7 seconds on my machine.

Method 1: $hashValue = $hash.ComputeHash((Get-Content -Path $FilePath -Stream "")). This also is streaming the whole file, and it also takes forever. For the same 3GB file it takes something longer than 5 minutes (I cancelled at that point, and don't know what the total duration would be).

Method 2: $hashValue = $hash.ComputeHash((Get-Content -Path $FilePath -Encoding byte -TotalCount $qtyBytes -Stream "")). This is the same as Method 1, except that it limits the content to $qtyBytes. At 1000000 (1MB) it takes 18 seconds. I think that means Method 1 would have taken ~15 hours, 7700x slower than Method 0.

Is there a way to do something like Method 2 (limit what is read) but without the slow down? And if so, is there a good way to do it on just the end of the file?

Thanks!

CodePudding user response:

You could try one (or a combination of both) of the following helper functions to read a number of bytes from the beginning of the file or taken from the end:

function Read-FirstBytes {
    param (
        [Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true, Position = 0)]
        [Alias('FullName', 'FilePath')]
        [ValidateScript({ Test-Path -Path $_ -PathType Leaf })]
        [string]$Path,        
        
        [Parameter(Mandatory=$true, Position = 1)]
        [int]$Bytes,

        [ValidateSet('ByteArray', 'HexString', 'Base64')]
        [string]$As = 'ByteArray'
    )
    try {
        $stream = [System.IO.File]::OpenRead($Path)
        $length = [math]::Min([math]::Abs($Bytes), $stream.Length)
        $buffer = [byte[]]::new($length)
        $null   = $stream.Read($buffer, 0, $length)
        switch ($As) {
            'HexString' { ($buffer | ForEach-Object { "{0:x2}" -f $_ }) -join '' ; break }
            'Base64'    { [Convert]::ToBase64String($buffer) ; break }
            default     { ,$buffer }
        }
    }
    catch { throw }
    finally { $stream.Dispose() }
}

function Read-LastBytes {
    param (
        [Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true, Position = 0)]
        [Alias('FullName', 'FilePath')]
        [ValidateScript({ Test-Path -Path $_ -PathType Leaf })]
        [string]$Path,        
        
        [Parameter(Mandatory=$true, Position = 1)]
        [int]$Bytes,

        [ValidateSet('ByteArray', 'HexString', 'Base64')]
        [string]$As = 'ByteArray'
    )
    try {
        $stream = [System.IO.File]::OpenRead($Path)
        $length = [math]::Min([math]::Abs($Bytes), $stream.Length)
        $null   = $stream.Seek(-$length, 'End')
        $buffer = for ($i = 0; $i -lt $length; $i  ) { $stream.ReadByte() }
        switch ($As) {
            'HexString' { ($buffer | ForEach-Object { "{0:x2}" -f $_ }) -join '' ; break }
            'Base64'    { [Convert]::ToBase64String($buffer) ; break }
            default     { ,[Byte[]]$buffer }
        }
    }
    catch { throw }
    finally { $stream.Dispose() }
}

Then you can compute a hash value from it and format as you like.

Combinations are possible like

$begin = Read-FirstBytes -Path 'D:\Test\somefile.dat' -Bytes 50    # take the first 50 bytes
$end   = Read-LastBytes -Path 'D:\Test\somefile.dat' -Bytes 1000   # and the last 1000 bytes

$Algorithm = 'MD5'
$hash  = [Security.Cryptography.HashAlgorithm]::Create($Algorithm)
$hashValue = $hash.ComputeHash($begin   $end)

($hashValue  | ForEach-Object { "{0:x2}" -f $_ }) -join ''

CodePudding user response:

I believe this would be a more efficient way of reading the last bytes of your file using System.IO.BinaryReader. You can combine this function with the function you have, it can read all bytes, last n bytes (-Last) or first n bytes (-First).

function Read-Bytes {
[cmdletbinding()]
param(
    [parameter(
        Mandatory,
        ValueFromPipelineByPropertyName,
        Position = 0
    )][alias('FullName')]
    [ValidateScript({ 
        if(Test-Path $_ -PathType Leaf)
        {
            return $true
        }
        throw 'Invalid File Path'
    })]
    [System.IO.FileInfo]$Path,
    [parameter(
        HelpMessage = 'Specifies the number of Bytes from the beginning of a file.',
        ParameterSetName = 'FirstBytes',
        Position = 1
    )]
    [int]$First,
    [parameter(
        HelpMessage = 'Specifies the number of Bytes from the end of a file.',
        ParameterSetName = 'LastBytes',
        Position = 1
    )]
    [int]$Last
)

    process
    {
        try
        {
            $reader = [System.IO.BinaryReader]::new(   
                [System.IO.File]::Open(
                    $Path.FullName,
                    [system.IO.FileMode]::Open,
                    [System.IO.FileAccess]::Read
                )
            )

            $stream = $reader.BaseStream
            
            $length = (
                $stream.Length, $First
            )[[int]($First -lt $stream.Length -and $First)]

            $stream.Position = (
                0, ($length - $Last)
            )[[int]($length -and $length -gt $Last -and $length)]
            
            $bytes = while($stream.Position -ne $length)
            {
                $stream.ReadByte()
            }

            [pscustomobject]@{
                FilePath = $Path.FullName
                Length = $length
                Bytes = $bytes
            }
        }
        catch
        {
            Write-Warning $_.Exception.Message
        }
        finally
        {
            $reader.Close()
            $reader.Dispose()
        }
    }
}

Usage

  • Get-ChildItem . -File | Read-Bytes -Last 100: Reads the last 100 bytes of all files on the current folder. If the -Last argument exceeds the file length, it reads the entire file.
  • Get-ChildItem . -File | Read-Bytes -First 100: Reads the first 100 bytes of all files on the current folder. If the -First argument exceeds the file length, it reads the entire file.
  • Read-Bytes -Path path/to/file.ext: Reads all bytes of file.ext.

Output

Returns an object with the properties FilePath, Length, Bytes.

FilePath                            Length Bytes
--------                            ------ -----
/home/user/Documents/test/......        14 {73, 32, 119, 111…}
/home/user/Documents/test/......         0 
/home/user/Documents/test/......         0 
/home/user/Documents/test/......         0 
/home/user/Documents/test/......       116 {111, 109, 101, 95…}
/home/user/Documents/test/......     17963 {50, 101, 101, 53…}
/home/user/Documents/test/......      3617 {105, 32, 110, 111…}
/home/user/Documents/test/......       638 {101, 109, 112, 116…}
/home/user/Documents/test/......         0 
/home/user/Documents/test/......        36 {65, 99, 114, 101…}
/home/user/Documents/test/......       735 {117, 112, 46, 79…}
/home/user/Documents/test/......      1857 {108, 111, 115, 101…}
/home/user/Documents/test/......        77 {79, 80, 69, 78…}
  • Related