Home > Software design >  Powershell doesn't read files size correctly if these are in progress of change (i.e files bein
Powershell doesn't read files size correctly if these are in progress of change (i.e files bein

Time:12-20

I am crafting a Powershell script that aims to turn off a Windows machine (laptop/PC) once a download of one or more files is completed (for the sake of the example, let's assume one large file is the case here).

In few words, we're reading the size of the entire downloads folder each x seconds and we compare the before and after delay sizes. If there is no change in the recent time, that means the download is either completed or stuck, both cases lead to shutdown.

Given the following script (the size getter can be a standalone function at some point, yes):

#Powershell5
#Major  Minor  Build  Revision
# -----  -----  -----  --------
# 5      1      19041  1320  

$downloadDir = "C:\Users\edi\Downloads"

while (1) 
{
    $s1 = Get-ChildItem -Path $downloadDir | Measure-Object -Property Length -Sum | Select-Object Sum
    write-host "S1:" $s1.sum

    # this 3 seconds time is just for testing purposes; in real case scenario this will most
    # likely be set to 60 or 120 seconds.
    Start-Sleep -s 3

    $s2 = Get-ChildItem -Path $downloadDir | Measure-Object -Property Length -Sum | Select-Object Sum
    write-host "S2:" $s2.sum

    if ($s1.sum -eq $s2.sum) 
    {
        write-host "Download complete, shutting down.."
        # loop exit is this, actual shutdown; commented out for testing purposes.
        #shutdown /s
    } 
}

The problem I am facing is that the file sizes reading is not done "in real time". In other words, the file size is not changing as you would normally see in Explorer view. I need to be able to read these numbers (changing file size) in real time.

Interesting fact: while the download is in progress and the script running, if manually going to the downloads folder and hit F5 / Refresh... the numbers change (the size reading is accurate).

Side note: My research got me to this article that might present the root cause, but I am not 100% sure of it: https://devblogs.microsoft.com/oldnewthing/20111226-00/?p=8813

Appreciate any idea on this. Thanks in advance!

CodePudding user response:

I'm unable to reproduce, you can use this for testing. To demonstrate that the file size can be monitored in real-time, [System.IO.StreamWriter] is writing to a file adding 1 byte on each iteration until the the file reaches 1Mb. My only guess would be that the file being download has pre-allocated space however I don't see how that would be possible since, as you explained, by looking on Explorer you can see the file size increasing.

$checkSize = {
    (Get-ChildItem $PWD -File | Measure-Object -Property Length -Sum).Sum
}

$currentSize = & $checkSize
$testFile = Join-Path $pwd -ChildPath "testfile.dump"

$job = Start-Job {

    $writer = [System.IO.StreamWriter]::new(   
        [System.IO.File]::Create($using:testFile)
    )
    0..1Mb | ForEach-Object { $writer.Write(1) }
    $writer.Close()

} -Name testDump

'
Starting test:
'

do
{
    $increasingSizeBefore = & $checkSize
    Start-Sleep -Seconds 2
    $increasingSizeAfter = & $checkSize

    'StartingSize: {0} - SizeBefore: {1} - SizeAfter: {2}' -f
    $currentSize, $increasingSizeBefore, $increasingSizeAfter

} until ($increasingSizeBefore -eq $increasingSizeAfter)

$job | Stop-Job -PassThru | Remove-Job

Results for me:

Starting test:

StartingSize: 17458 - SizeBefore: 17458 - SizeAfter: 152626
StartingSize: 17458 - SizeBefore: 152626 - SizeAfter: 406578
StartingSize: 17458 - SizeBefore: 406578 - SizeAfter: 652338
StartingSize: 17458 - SizeBefore: 652338 - SizeAfter: 902194
StartingSize: 17458 - SizeBefore: 902194 - SizeAfter: 1066035
StartingSize: 17458 - SizeBefore: 1066035 - SizeAfter: 1066035

CodePudding user response:

I suggest adopting a different strategy:

  • Set an overall timeout for each download process, such as curl.exe's --max-time option.

  • Unfortunately, PowerShell's own Invoke-WebRequest and Invoke-RestMethod appear to have only a connection timeout (-TimeoutSec), not a timeout for the overall connection.

That way you can track the download processes, and trigger a reboot once all of them have terminated (whether due to completion or timeout).


As for your approach:

  • The on-disk file size that you can query via Get-ChildItem is not updated continuously while a file is being written, as you've observed, and while it is eventually updated, that may not happen until the file has been closed, i.e. written in full.

  • However, you can update the file-size information on demand, namely via the System.IO.FileSystemInfo.Refresh() method, which is the equivalent of the manual refreshing you performed via File Explorer.

    • Note, however, that this still isn't real-time size information due to internal buffering of writes. For Invoke-WebRequest / Invoke-RestMethod, this buffer seems to be 4KB.
# Perform this before every Measure-Object call.
# It refreshes the size information of all files in the specified dir.
(Get-ChildItem -File -LiteralPath $downloadDir).Refresh()

As for narrowing down what downloads have completed vs. which ones are assumed to be stuck:

Invoke-WebRequest / Invoke-RestMethod exclusively lock their output files while a download is ongoing, so you can make a read attempt to see which files cannot be read from, from which you can infer what downloads, if any, are still ongoing:

# Note: In PowerShell (Core) 7 , use -AsByteStream instead of -Encoding Byte
Get-ChildItem -File -LiteralPath $downloadDir | 
  Get-Content -Encoding Byte -First 1 -ErrorVariable errs -ErrorAction SilentlyContinue |
    Out-Null

if ($errs) { Write-Warning "Incomplete downloads:`n$($errs.TargetObject)" }
  • Related