Home > Mobile >  Powershell file parsing very slow
Powershell file parsing very slow

Time:02-02

I am using Powershell 7.

We have the following PowerShell script that will parse some very large file.

I no longer want to use 'Get-Content' as this is to slow.

The script below works, but it takes a very long time to process even a 10 MB file.

I have about 200 files 10MB file with over 10000 lines.

Sample Log:

#Fields:1
#Fields:2
#Fields:3
#Fields:4
#Fields: date-time,connector-id,session-id,sequence-number,local-endpoint,remote-endpoint,event,data,context
2023-01-31T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.22:15650,<,DATA,
2023-01-31T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.25:15650,<,DATA,

Script:

$Output = @()
$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-Item  $LogFilePath
$Count = @($logfiles).count

ForEach ($Log in $LogFiles)
{
    $Int = $Int   1
    
    $Percent = $Int/$Count * 100

    Write-Progress -Activity "Collecting Log details" -Status "Processing log File $Int of $Count - $LogFile" -PercentComplete $Percent 

    Write-Host "Processing Log File  $Log" -ForegroundColor Magenta
    Write-Host
    $FileContent = Get-Content $Log | Select-Object -Skip 5
    ForEach ($Line IN $FileContent)
    {

        $Socket = $Line  | Foreach {$_.split(",")[5] }

        $IP = $Socket.Split(":")[0]

        $Output  = $IP

    } 
} 
$Output = $Output | Select-Object -Unique
$Output = $Output | Sort-Object

Write-Host "List of noted remove IPs:" 
$Output
Write-Host 
$Output | Out-File $PWD\Output.txt 

CodePudding user response:

As @iRon Suggests the assignment operator ( =) is a lot of overhead. As well as reading entire file to a variable then processing it. Perhaps process it strictly as a pipeline. I achieved same results, using your sample data, with the code written this way below.

$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-ChildItem $LogFilePath
$Count = @($logfiles).count

$Output = ForEach($Log in $Logfiles) {
    # Code for Write-Progress here
    Get-Content -Path $Log.FullName | Select-Object -Skip 5 | ForEach-Object {
        $Socket = $_.split(",")[5] 
        $IP = $Socket.Split(":")[0]
        $IP
    }
}
$Output = $Output | Select-Object -Unique
$Output = $Output | Sort-Object

Write-Host "List of noted remove IPs:" 
$Output

CodePudding user response:

Apart from the notable points in the comments, I believe this question is more suitable to Code Review. Nonetheless, here's my take on this using the StreamReader class:

$LogFilePath = "C:\LOGS\*.log"
$LogFiles    = Get-Item -Path $LogFilePath
$OutPut      = [System.Collections.ArrayList]::new()

foreach ($log in $LogFiles)
{
    $skip = 0
    $stop = $false
    $stream = [System.IO.StreamReader]::new($log.FullName)
    while ($line = $stream.ReadLine())
    {
        if (-not$stop)
        {
            if ($skip   -eq 5)
            {
                $stop = $true
            }
            continue
        }
        elseif ($OutPut.Contains(($IP = ($line -split ',|:')[-5])))
        {
            continue
        }
        $null = $OutPut.Add($IP)
    }
    $stream.Close()
    $stream.Dispose()
}
# Display OutPut and save to file
Write-Host -Object "List of noted remove IPs:" 
$OutPut | Sort-Object | Tee-Object -FilePath "$PWD\Output.txt"

This way you can output unique IP's since it's being handled by an if statement checking against what's in $OutPut; essentially replacing Select-Object -Unique. You should see a speed increase as you're no longer adding to a fixed array ( =), and piping to other cmdlets.

  • Related