Home > database >  Check all lines in a huge CSV file in PowerShell
Check all lines in a huge CSV file in PowerShell

Time:01-04

I want to work with a CSV file of more than 300,000 lines. I need to verify information line by line and then display it in a .txt file in the form of a table to see which file was missing for all servers. For example

Name,Server
File1,Server1
File2,Server1
File3,Server1
File1,Server2
File2,Server2
...
File345,Server76
File346,Server32

I want to display in table form this result which corresponds to the example above:

Name     Server1 Server2 ... Server 32 ....Server 76
File1       X       X
File2       X       X
File3       X       
...
File345                                       X
File346                          X

To do this actually, I have a function that creates objects where the members are the Server Name (The number of members object can change) and I use stream reader to split data (I have more than 2 columns in my csv so 0 is for the Server name and 5 for the file name)

$stream = [System.IO.StreamReader]::new($File)
$stream.ReadLine()  | Out-Null
while ((-not $stream.EndOfStream)) {
    $line = $stream.ReadLine()
    $strTempo = $null
    $strTempo = $line -split ","
    $index = $listOfFile.Name.IndexOf($strTempo[5])
    if ($index -ne -1) {   
        $property = $strTempo[0].Replace("-", "_")
        $listOfFile[$index].$property = "X"
    }
    else {
        $obj = CreateEmptyObject ($listOfConfiguration)
        $obj.Name = $strTempo[5]
        $listOfFile.Add($obj) | Out-Null
    }
}

When I export this I have a pretty good result. But the script take so much time (between 20min to 1hour)

I didn't know how optimize actually the script. I'm beginner to PowerShell.

Thanks for the futures tips

CodePudding user response:

You might use HashSets for this:

$Servers = [System.Collections.Generic.HashSet[String]]::New()
$Files = @{}
Import-Csv -Path $Path |ForEach-Object {
    $Null = $Servers.Add($_.Server)
    if ($Files.Contains($_.Name)) { $Null = $Files[$_.Name].Add($_.Server) }
    else { $Files[$_.Name] = [System.Collections.Generic.HashSet[String]]$_.Server }
}
$Table = foreach($Name in $Files.get_Keys()) {
    $Properties = [Ordered]@{ Name = $Name }
    ForEach ($Server in $Servers) {
        $Properties[$Server] = if ($Files[$Name].Contains($Server)) { 'X' }
    }
    [PSCustomObject]$Properties
}
$Table |Format-Table -Property @{ expression='*' }

Note that in contrast to PowerShell's usual behavior, the .Net HashSet class is case-sensitive by default. To create an case-insensitive HashSet use the following constructor:

[System.Collections.Generic.HashSet[String]]::New([StringComparer]::OrdinalIgnoreCase)

CodePudding user response:

See if this works faster. Change filename as required

$Path = "C:\temp\test1.txt"
$table = Import-Csv -Path $Path
$columnNames = $table | Select-Object -Property Server -Unique| foreach{$_.Server} | Sort-Object
Write-Host "names = " $columnNames
$groups = $table | Group-Object {$_.Name}

$outputTable =  [System.Collections.ArrayList]@()
foreach($group in $groups)
{
   Write-Host "Group = " $group.Name
   $newRow = New-Object -TypeName psobject
   $newRow | Add-Member -NotePropertyName Name -NotePropertyValue $group.Name
   $servers = $group.Group | Select-Object -Property Server | foreach{$_.Server}
Write-Host "servers = " $servers
   foreach($item in $columnNames)
   {
      if($servers.Contains($item))
      {
         $newRow | Add-Member -NotePropertyName $item -NotePropertyValue 'X'
      }
      else
      {
         #if you comment out next line code doesn't work
         $newRow | Add-Member -NotePropertyName $item -NotePropertyValue ''
      }
   }
   $outputTable.Add($newRow)  | Out-Null
   
}
$outputTable | Format-Table 
  • Related