For PowerShell 2.0 in Win 2008, I need to check what's the newest file in a directory with about 1.6 million files.
I know I can use Get-ChildItem like so:
$path="G:\Calls"
$filter='*.wav'
$lastFile = Get-ChildItem -Recurse -Path $path -Include $filter | Sort-Object -Property LastWriteTime | Select-Object -Last 1
$last.Name
$last.LastWriteTime
The issue is that it takes sooooo long to find the newest file due to the sheer amount of files.
Is there a faster way to find that?
CodePudding user response:
Sort-Object
is known to be slow as it aggregates over each item combination.
But you don't need to do that as you might just go over each file and keep track of the latest one:
Get-ChildItem -Recurse |ForEach-Object `
-Begin { $Newest = $Null } `
-Process { if ($_.LastWriteTime -gt $Newest.LastWriteTime) { $Newest = $_ } } `
-End { $Newest }
CodePudding user response:
there are a couple of things that can be done to improve performance.
First, use -Filter
rather than -Include
because the filter is passed to the underlying Win32API which will be a bit faster.
Also, because the script gathers all the files and then sorts them, you might be creating a very large memory footprint during the sorting phase. I don't know if it's possible to query the MFT or some other process which avoids retrieving each file and inspecting the lastwritetime, but an alternative approach could be:
gci -rec -file -filter *.wav | %{$v = $null}{if ($_.lastwritetime -gt $v.lastwritetime){$v=$_}}{$v}
I tried this with all files and saw the following:
measure-command{ ls -rec -file |sort lastwritetime|select -last 1}
. . .
TotalSeconds : 142.1333641
vs
measure-command { gci -rec -file | %{$v = $null}{if ($_.lastwritetime -gt $v.lastwritetime){$v=$_}}{$v} }
. . .
TotalSeconds : 87.7215093
which is a pretty good savings. There may be additional ways to improve performance