I have a powershell script that reads and parses a text file. The file is read into memory and then processed line by line.
When I switched from Powershell 4.0 to 5.1, the script became about 10 times slower (60 seconds instead of 6 seconds). Does anyone have an idea how I can make the script run faster?
#------------------------------------------------------------------------
# Function GetNextLine.
# Read next text line from input text array into variables.
# Parameter:
# $inTextArr[]: (in) Input text array
# $linenr : (inout) line number, will be increased
# $line : (out) read text line
# $line2 : (out) read text line without leading and trailing spaces
# $s1 : (out) first word read in text line
#------------------------------------------------------------------------
function GetNextLine {
param (
[ref]$inTextArr
,[ref]$linenr
,[ref]$line
,[ref]$line2
,[ref]$s1
)
$linenr.value
$line.value = $inTextArr.value[$linenr.value-1]
$line2.value= $line.value.trim()
$s1.value = $line2.value.split(" ")[0]
} # function GetNextLine
#------------------------------------------------------------------------
#------------------------------------------------------------------------
# Function ParseMifFile.
# Parse input text array.
#------------------------------------------------------------------------
function ParseMifFile {
param(
[ref]$inTextArr
)
# Initialize output parameters and variables.
[int]$linenr = 0
[string]$line = ""
[string]$line2= ""
[string]$s1 = ""
# (Extract. The orginal script has lots of GetNextLine calls.)
GetNextLine -inTextArr ([ref]$inTextArr.value) -linenr ([ref]$linenr) -line ([ref]$line) -line2 ([ref]$line2) -s1 ([ref]$s1)
while ($line -cne "# End of MIFFile") {
GetNextLine -inTextArr ([ref]$inTextArr.value) -linenr ([ref]$linenr) -line ([ref]$line) -line2 ([ref]$line2) -s1 ([ref]$s1)
}
} # function ParseMifFile
#------------------------------------------------------------------------
# Prepare a large text array for performance test below (just for test purpose, instead of reading in the input file).
$inTextArr= @()
for ($i= 1; $i -lt 50000; $i ) {
$inTextArr= $inTextArr "This is a line from the input file"
}
$inTextArr= $inTextArr "# End of MIFFile"
# Performance test of function ParseMifFile.
measure-command {
ParseMifFile -inTextArr ([ref]$inTextArr)
# Very slow in Powershell 5.1.17763.2803 (60 sec) compared to Powershell 4.0 (6 sec)
}
CodePudding user response:
Not sure on why slower in 5.1 You're passing the entire content to each nextline function call, though.
"Does anyone have an idea how I can make the script run faster?"
How's the time on this? It should get your file content and pass each line into a similar object.
$MyContentArray = Get-Content -Path "[C:\My\FilePath]" | ForEach-Object {
[PSCustomObject]@{
liner = $i
line = $_
line2 = $_.trim()
s1 = ($_.trim()).split(" ")[0]
}
}
CodePudding user response:
It seems to me that passing a text array using a reference parameter to the function takes a lot of time.
As an alternative I have created a custom object as suggested in the answer below, which contains all the information about the currently read file line. The program is so much faster (1.5 sec) instead of 60 sec.
function GetNextLine {
param (
[PSCustomObject]$linePtrO
)
$linePtrO.linenr
$linePtrO.line = $linePtrO.inTextArr[$linePtrO.linenr-1]
$linePtrO.line2= $linePtrO.line.trim()
$linePtrO.s1 = $linePtrO.line2.split(" ")[0]
} # function GetNextLine
function ParseMifFile {
param (
[string]$inFileNameStr
)
$linePtrO = [PSCustomObject] @{
inTextArr = get-content -path $inFileNameStr -encoding utf8
linenr = 0
line = ""
line2 = ""
s1 = ""
}
# (Extract. The orginal script has lots of GetNextLine calls.)
GetNextLine -linePtrO $linePtrO
while ($linePtrO.line -cne "# End of MIFFile") {
GetNextLine -linePtrO $linePtrO
}
} # function ParseMifFile
$fileNameStr= "C:\mypath\myfile"
ParseMifFile -inFileNameStr $fileNameStr