I am using Powershell 7.
We have the following PowerShell script that will parse some very large file.
I no longer want to use 'Get-Content' as this is to slow.
The script below works, but it takes a very long time to process even a 10 MB file.
I have about 200 files 10MB file with over 10000 lines.
Sample Log:
#Fields:1
#Fields:2
#Fields:3
#Fields:4
#Fields: date-time,connector-id,session-id,sequence-number,local-endpoint,remote-endpoint,event,data,context
2023-01-31T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.22:15650,<,DATA,
2023-01-31T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.25:15650,<,DATA,
Script:
$Output = @()
$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-Item $LogFilePath
$Count = @($logfiles).count
ForEach ($Log in $LogFiles)
{
$Int = $Int 1
$Percent = $Int/$Count * 100
Write-Progress -Activity "Collecting Log details" -Status "Processing log File $Int of $Count - $LogFile" -PercentComplete $Percent
Write-Host "Processing Log File $Log" -ForegroundColor Magenta
Write-Host
$FileContent = Get-Content $Log | Select-Object -Skip 5
ForEach ($Line IN $FileContent)
{
$Socket = $Line | Foreach {$_.split(",")[5] }
$IP = $Socket.Split(":")[0]
$Output = $IP
}
}
$Output = $Output | Select-Object -Unique
$Output = $Output | Sort-Object
Write-Host "List of noted remove IPs:"
$Output
Write-Host
$Output | Out-File $PWD\Output.txt
CodePudding user response:
As @iRon Suggests the assignment operator ( =) is a lot of overhead. As well as reading entire file to a variable then processing it. Perhaps process it strictly as a pipeline. I achieved same results, using your sample data, with the code written this way below.
$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-ChildItem $LogFilePath
$Count = @($logfiles).count
$Output = ForEach($Log in $Logfiles) {
# Code for Write-Progress here
Get-Content -Path $Log.FullName | Select-Object -Skip 5 | ForEach-Object {
$Socket = $_.split(",")[5]
$IP = $Socket.Split(":")[0]
$IP
}
}
$Output = $Output | Select-Object -Unique
$Output = $Output | Sort-Object
Write-Host "List of noted remove IPs:"
$Output
CodePudding user response:
Apart from the notable points in the comments, I believe this question is more suitable to Code Review. Nonetheless, here's my take on this using the StreamReader
class:
$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-Item -Path $LogFilePath
$OutPut = [System.Collections.ArrayList]::new()
foreach ($log in $LogFiles)
{
$skip = 0
$stop = $false
$stream = [System.IO.StreamReader]::new($log.FullName)
while ($line = $stream.ReadLine())
{
if (-not$stop)
{
if ($skip -eq 5)
{
$stop = $true
}
continue
}
elseif ($OutPut.Contains(($IP = ($line -split ',|:')[-5])))
{
continue
}
$null = $OutPut.Add($IP)
}
$stream.Close()
$stream.Dispose()
}
# Display OutPut and save to file
Write-Host -Object "List of noted remove IPs:"
$OutPut | Sort-Object | Tee-Object -FilePath "$PWD\Output.txt"
This way you can output unique IP's since it's being handled by an if
statement checking against what's in $OutPut
; essentially replacing Select-Object -Unique
. You should see a speed increase as you're no longer adding to a fixed array ( =
), and piping to other cmdlets.