I am a programming enthusiast and novice, I am using Powershell to try to solve the following need:
- I need to extract the full path of files with extension .img. inside a folder with /- 900 thousand folders and /- million files. -/ 900,000 img files.
- Each img file must be processed in an exe. that is read from a file.
Which is better to store the result of the GetChildItem in a variable or a file?
I would greatly appreciate your guidance and support to optimize and / or find the best way to speed up processes vs. resource consumption.
Thank you un advance!!
This is the code I am currently using:
$PSDefaultParameterValues['*:Encoding'] = 'Ascii'
$host.ui.RawUI.WindowTitle = “DICOM IMPORT IN PROGRESS”
#region SET WINDOW FIXED WIDTH
$pshost = get-host
$pswindow = $pshost.ui.rawui
$newsize = $pswindow.buffersize
$newsize.height = 3000
$newsize.width = 150
$pswindow.buffersize = $newsize
$newsize = $pswindow.windowsize
$newsize.height = 50
$newsize.width = 150
$pswindow.windowsize = $newsize
#endregion
#
$out = ("$pwd\log_{0:yyyyMMdd_HH.mm.ss}_import.txt" -f (Get-Date))
cls
"`n" | tee -FilePath $out -Append
"*****************" | tee -FilePath $out -Append
"**IMPORT SCRIPT**" | tee -FilePath $out -Append
"*****************" | tee -FilePath $out -Append
"`n" | tee -FilePath $out -Append
#
# SET SEARCH FOLDERS #
"Working Folder" | tee -FilePath $out -Append
$path1 = Read-Host "Enter folder location" | tee -FilePath $out -Append
"`n" | tee -FilePath $out -Append
#
#
# SET & SHOW HOSTNAME
"SERVER NAME" | tee -FilePath $out -Append
$ht = hostname | tee -FilePath $out -Append
Write-Host $ht
Start-Sleep -Seconds 3
"`n" | tee -FilePath $out -Append
#
#
# GET FILES
"`n" | tee -FilePath $out -Append
#"SEARCHING IMG FILES, PLEASE WAIT..." | tee -FilePath $out -Append
$files = $path1 | Get-ChildItem -recurse -file -filter *.img | ForEach-Object { $_.FullName }
# SHOW Get-ChildItem PROCESS ON CONSOLE
Out-host -InputObject $files
"`n" | tee -FilePath $out -Append
Write-Output ($files | Measure).Count "IMG FILES FOUND TO PUSH" | tee -FilePath $out -Append
# DUMP Get-ChildIte into a file
$files > $pwd\pf
Start-Sleep -Seconds 5
# TIMESTAMP
"`n" | tee -FilePath $out -Append
"IMPORT START" | tee -FilePath $out -Append
("{0:yyyy/MM/dd HH:mm:ss}" -f (Get-Date)) | tee -FilePath $out -Append
"********************************" | tee -FilePath $out -Append
"`n" | tee -FilePath $out -Append
#
#
#SET TOOL
$ir = $Env:folder_tool
$pt = "utils\tool.exe"
#
#PROCESSING FILES
$n = 1
$pe = foreach ($file in Get-Content $pwd\pf ) {
$tb = (Get-Date -f HH:mm:ss) | tee -FilePath $out -Append
$fp = "$n. $file" | tee -FilePath $out -Append
#
$ep = & $ir$pt -c $ht"FIR" -i $file | tee -FilePath $out -Append
$as = "`n" | tee -FilePath $out -Append
# PRINT CONSOLE IMG FILES PROCESS
Write-Host $tb
Write-Host $fp
Out-host -InputObject $ep
Write-Host $as
$n
}
#
#TIMESTAMP
"********************************" | tee -FilePath $out -Append
"IMPORT END" | tee -FilePath $out -Append
("{0:yyyy/MM/dd HH:mm:ss}" -f (Get-Date)) | tee -FilePath $out -Append
"`n" | tee -FilePath $out -Append
CodePudding user response:
Which is better to store the result of the GetChildItem in a variable or a file?
If you're hoping to keep memory utilization low, the best solution is to not store them at all - simply consume the output from Get-ChildItem
directly:
$pe = Get-ChildItem -Recurse -File -filter *.img |ForEach-Object {
$file = $_.FullName
$tb = (Get-Date -f HH:mm:ss) | tee -FilePath $out -Append
$fp = "$n. $file" | tee -FilePath $out -Append
#
$ep = & $ir$pt -c $ht"FIR" -i $file | tee -FilePath $out -Append
$as = "`n" | tee -FilePath $out -Append
# PRINT CONSOLE IMG FILES PROCESS
Write-Host $tb
Write-Host $fp
Out-host -InputObject $ep
Write-Host $as
$n
}
CodePudding user response:
Try using parallel with PoshRSJob.
Replace Start-Process
in Process-File
with your code and note that there is no access to console. Process-File
must return string.
Adjust $JobCount
and $inData
.
The main idea is to load all file list into ConcurrentQueue
, start 20 background jobs and wait them to exit. Each job will take value from queue and pass to Process-File
, then repeat until queue is empty.
NOTE: If you stop script, RS Jobs will continue to run until they finished or powershell closed. Use Get-RSJob | Stop-RSJob
and Get-RSJob | Remove-RSJob
to stop background work
Import-Module PoshRSJob
Function Process-File
{
Param(
[String]$FilePath
)
$process = Start-Process -FilePath 'ping.exe' -ArgumentList '-n 5 127.0.0.1' -PassThru -WindowStyle Hidden
$process.WaitForExit();
return "Processed $FilePath"
}
$JobCount = [Environment]::ProcessorCount - 2
$inData = [System.Collections.Concurrent.ConcurrentQueue[string]]::new(
[System.IO.Directory]::EnumerateFiles('S:\SCRIPTS\FileTest', '*.img')
)
$JobScript = [scriptblock]{
$inQueue = [System.Collections.Concurrent.ConcurrentQueue[string]]$args[0]
$outBag = [System.Collections.Concurrent.ConcurrentBag[string]]$args[1]
$currentItem = $null
while($inQueue.TryDequeue([ref] $currentItem) -eq $true)
{
try
{
# Add result to OutBag
$result = Process-File -FilePath $currentItem -EA Stop
$outBag.Add( $result )
}
catch
{
# Catch error
Write-Output $_.Exception.ToString()
}
}
}
$resultData = [System.Collections.Concurrent.ConcurrentBag[string]]::new()
$i_cur = $inData.Count
$i_max = $i_cur
# Start jobs
$jobs = @(1..$JobCount) | % { Start-RSJob -ScriptBlock $JobScript -ArgumentList @($inData, $resultData) -FunctionsToImport @('Process-File') }
# Wait queue to empty
while($i_cur -gt 0)
{
Write-Progress -Activity 'Doing job' -Status "$($i_cur) left of $($i_max)" -PercentComplete (100 - ($i_cur / $i_max * 100))
Start-Sleep -Seconds 3 # Update frequency
$i_cur = $inData.Count
}
# Wait jobs to complete
$logs = $jobs | % { Wait-RSJob -Job $_ } | % { Receive-RSJob -Job $_ }
$jobs | % { Remove-RSJob -Job $_ }
$Global:resultData = $resultData
$Global:logs = $logs
$Global:resultData is array of Process-File
return strings