Home > database >  How to truncate the end of a binary file past known address using PowerShell?
How to truncate the end of a binary file past known address using PowerShell?

Time:09-23

I apologize up front for the lengthy post, but I'm trying to include the script I've worked with and tested so far. I'm also very new to working with Binary files and PowerShell - and I'm pulling out my hair here. I have a file where I must remove data from a known address to the end of the file. I have referenced multiple articles here on S.O. but the one that seems to point me closest to what I'm wanting to accomplish is here which links to an article I found as well here.

I feel like I'm really close, but I'm not sure I'm using the function correctly, as I'm having a bit of trouble sussing out the regex for a hex equivalent of ".*" to find 0 or more matches to delete the remaining data from the known address to the end of the file. Maybe I'm thinking too complex?

My known address is always 005A08B0, and nothing afterward ever has a repeatable pattern so I can't simply use a pattern like \xF0\x00\x01 or similar to search for.

This portion of the script is not changed - the function I assume would still be the same, and on a loose level, I understand what it's doing - streaming the file specified and going to the end of the file to find the number of matched regex patterns:

function ConvertTo-BinaryString {
    # converts the bytes of a file to a string that has a
    # 1-to-1 mapping back to the file's original bytes. 
    # Useful for performing binary regular expressions.
    [OutputType([String])]
    Param (
        [Parameter(Mandatory = $True, ValueFromPipeline = $True, Position = 0)]
        [ValidateScript( { Test-Path $_ -PathType Leaf } )]
        [String]$Path
    )

    $Stream = New-Object System.IO.FileStream -ArgumentList $Path, 'Open', 'Read'

    # Note: Codepage 28591 returns a 1-to-1 char to byte mapping
    $Encoding     = [Text.Encoding]::GetEncoding(28591)
    $StreamReader = New-Object System.IO.StreamReader -ArgumentList $Stream, $Encoding
    $BinaryText   = $StreamReader.ReadToEnd()

    $StreamReader.Close()
    $Stream.Close()

    return $BinaryText
}

This portion for my input file is super simple to digest:

$inputFile  = 'C:\StartFile.dat'
$outputFile = 'C:\EndFile_test.dat'
$fileBytes  = [System.IO.File]::ReadAllBytes($inputFile)
$binString  = ConvertTo-BinaryString -Path $inputFile

This is where things fall apart, and I assume this would be the only piece I have to really modify:

# This is the portion I am having a problem with - what do I need to do for this regex???
$re = [Regex]'[\x5A08B0]{30}*'

This portion seems like I should not have to modify much, as the position will naturally move through the file and offset itself after each found match?

# use a MemoryStream object to store the result
$ms  = New-Object System.IO.MemoryStream
$pos = $replacements = 0

$re.Matches($binString) | ForEach-Object {
    # write the part of the byte array before the match to the MemoryStream
    $ms.Write($fileBytes, $pos, $_.Index)
    # update the 'cursor' position for the next match
    $pos  = ($_.Index   $_.Length)
    # and count the number of replacements done
    $replacements  
}

# write the remainder of the bytes to the stream
$ms.Write($fileBytes, $pos, $fileBytes.Count - $pos)

# save the updated bytes to a new file (will overwrite existing file)
[System.IO.File]::WriteAllBytes($outputFile, $ms.ToArray())
$ms.Dispose()

if ($replacements) {
    Write-Host "$replacements replacement(s) made."
}
else {
    Write-Host "Byte sequence not found. No replacements made."
}

Additionally, I have also tried the following to at least see if I could determine the appropriate address is being referenced on a known file, and this seems like it might be a good start to something different:

#Decimal Equivalent of the Hex Address:
$offset = 5900464

$bytes = [System.IO.File]::ReadAllBytes("C:TestFile.dat");
Echo $bytes[$offset]

When I run the smaller script above, I am at least getting the right character of the known file - it produces the Decimal equivalent of the Ascii char in the file.

I can do this manually w/ a hex-editor, but this has to be possible from a script. . . Appreciate all the help I can get. A few disclosures - it has to be done with programs native to windows 7/windows 10 - cannot download any separate executables, and SysInternals is a no-go as well. Was originally looking at a batch file idea, but I can port a PowerShell command into a batch file easy peasy.

CodePudding user response:

To simply truncate a file, i.e. to remove any content beyond a given byte offset, you can use System.IO.File's static OpenWrite() method to obtain a System.IO.FileStream instance and call its .SetLength() method:

$inputFile  = 'C:\StartFile.dat'
$outputFile = 'C:\EndFile_test.dat'

# First, copy the input file to the output file.
Copy-Item -LiteralPath $inputFile -Destination $outputFile

# Open the output file for writing.
$fs = [System.IO.File]::OpenWrite($outputFile)

# Set the file length based on the desired byte offset
# in order to truncate it (assuming it is larger).
$fs.SetLength(0x5A08B0)

$fs.Close()

Note: If the given offset amounts to increasing the size of the file, it seems like the additional space is filled with NUL (0x0) bytes, as a quick test on macOS and Windows suggests; however, it seems like this behavior is not guaranteed, judging by the .SetLength() documentation:

If the stream is expanded, the contents of the stream between the old and the new length are undefined.

  • Related