I will get the total number of bytes of 32 largest files in the folder. But its working very slowly. We have about 10TB data.
Command:
$big32 = Get-ChildItem c:\\temp -recurse | Sort-Object length -descending | select-object -first 32 | measure-object -property length –sum
$big32.sum /1gb
CodePudding user response:
The following implements improvements by only using PowerShell cmdlets. Using System.IO.Directory.EnumerateFiles()
as a basis as suggested by this answer might give another performance improvement but you should do your own measurements to compare.
(Get-ChildItem c:\temp -Recurse -File).ForEach('Length') |
Sort-Object -Descending -Top 32 |
Measure-Object -Sum
This should reduce memory consumption considerably as it only sorts an array of numbers instead of an array of FileInfo
objects. Maybe it's also somewhat faster due to better caching (an array of numbers is stored in a contiguous, cache-friendly block of memory, whereas an array of objects only stores the references in a contiguous way, but the objects themselfs can be scattered all around in memory).
Note the use of .ForEach('Length')
instead of just .Length
because of member enumeration ambiguity.
By using Sort-Object
parameter -Top
we can get rid of the Select-Object
cmdlet, further reducing pipeline overhead.
CodePudding user response:
I can think of some improvements, especially to memory usage but following should be considerable faster than Get-ChildItem
[System.IO.Directory]::EnumerateFiles('c:\temp', '*.*', [System.IO.SearchOption]::AllDirectories) |
Foreach-Object {
[PSCustomObject]@{
filename = $_
length = [System.IO.FileInfo]::New($_).Length
}
} |
Sort-Object length -Descending |
Select-Object -First 32
Edit
I would look at trying to implement an implit heap to reduce memory usage without hurting performance (possibly even improves it... to be tested)
Edit 2
If the filenames are not required, the easiest gain on memory is to not include them in the results.
[System.IO.Directory]::EnumerateFiles('c:\temp', '*.*', [System.IO.SearchOption]::AllDirectories) |
Foreach-Object {
[System.IO.FileInfo]::New($_).Length
} |
Sort-Object length -Descending |
Select-Object -First 32