Home > Enterprise >  How to speed up my SUPER SLOW search script
How to speed up my SUPER SLOW search script

Time:05-14

I am building a script to search for $name through a large batch of CSV files. These files can be as big as 67,000 KB. This is my script that I use to search the files:

Powershell Script

Essentially, I use Import-Csv. I change a few things depending on the file name, however. For example, some files don't have headers, or they may use a different delimiter. Then I store all the matches in $results and then return that variable. This is all put in a function called CSVSearch for ease of running.

#create function called CSV Search
function CSVSearch{
    #prompt
    $name = Read-Host -Prompt 'Input name'

    #set path to root folder
    $path = 'Path\to\root\folder\'

    #get the file path for each CSV file in root folder
    $files = Get-ChildItem $path -Filter *.csv | Select-Object -ExpandProperty FullName

    #count files in $files
    $filesCount = $files.Count

    #create empty array, $results
    $results= @()

    #count for write-progress
    $i = 0

      foreach($file in $files){
       
        Write-Progress -Activity "Searching files: $i out of $filesCount searched. $resultsCount match(es) found" -PercentComplete (($i/$files.Count)*100)
        #import method changes depending on CSV file name found in $file (headers, delimiters). 
        if($file -match 'File1*'){$results  = Import-Csv $file -Header A, Name, C, D -Delimiter '|' | Select-Object *,@{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
        if($file -match 'File2*'){$results  = Import-Csv $file -Header A, B, Name  -Delimiter '|' | Select-Object *,@{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
        if($file -match 'File3*'){$results  = Import-Csv $file | Select-Object *,@{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
        if($file -match 'File4*'){$results  = Import-Csv $file | Select-Object *,@{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
        $i  
        $resultsCount = $results.Count
    
    }

    #if the loop ends and $results array is empty, return "No matches."
    if(!$results){Write-Host 'No matches found.' -ForegroundColor Yellow}
    #return results stored in $results variable
    else{$results 
    Write-Host $resultsCount 'matches found.' -ForegroundColor Green
    Write-Progress -Activity "Completed" -Completed}

}

CSVSearch

Below are what the CSV files look like. Obviously, the amount of the data below is not going to equate to the actual size of the files. But below is the basic structure:

CSV files

File1.csv
1|Moonknight|QWEPP|L
2|Star Wars|QWEPP|T
3|Toy Story|QWEPP|U
File2.csv
JKLH|1|Moonknight
ASDF|2|Star Wars
QWER|3|Toy Story
File3.csv
1,Moonknight,AA,DDD
2,Star Wars,BB,CCC
3,Toy Story,CC,EEE
File4.csv
1,Moonknight,QWE
2,Star Wars,QWE
3,Toy Story,QWE

The script works great. Here is an example of the output I would receive if $name = Moonknight:

Example of results

A : 1
Name : Moonknight
C: QWE
FileName: Path\to\root\folder\File4.csv

A : 1
Name : Moonknight
B : AA
C : DDD
FileName: Path\to\root\folder\File3.csv

A : JKLH
B : 1
Name : Moonknight
FileName: Path\to\root\folder\File2.csv

A : 1
Name : Moonknight
C : QWEPP
D : L
FileName: Path\to\root\folder\File1.csv

4 matches found.

However, it is very slow, and I have a lot of files to search through. Any ideas on how to speed my script up?

Edit: I must mention. I tried importing the data into a hash table and then searching the hash table, but that was much slower.

CodePudding user response:

Give this one a try, it should be a bit faster. Select-Object has to reconstruct your object, if you use it before filtering, you're actually recreating your entire CSV, you want to filter first (Where-Object / .Where) before reconstructing it.

.Where should be a faster than Where-Object here, the caveat is that the intrinsic method requires that the collections already exists in memory, there is no pipeline processing and no streaming.

Write-Progress will only slow down your script, better remove it.

Lastly, you can use splatting to avoid having multiple if conditions.

function CSVSearch {
    [cmdletbinding()]
    param(
        [Parameter(Mandatory)]
        [string] $Name,

        [Parameter()]
        [string] $Path = 'Path\to\root\folder\'
    )

    $param = @{
        File1 = @{ Header = 'A', 'Name', 'C', 'D'; Delimiter = '|' }
        File2 = @{ Header = 'A', 'B', 'Name' ; Delimiter = '|' }
        File3 = @{}; File4 = @{} # File3 & 4 should have headers ?
    }
    $results = foreach($file in Get-ChildItem . -Filter file*.csv) {
        $thisparam = $param[$file.BaseName]
        $thisparam['LiteralPath'] = $file.FullName
        (Import-Csv @thisparam).where{ $_.Name -match $name } |
            Select-Object *, @{Name='FileName';Expression={$file}}
    }

    if(-not $results) {
        Write-Host 'No matches found.' -ForegroundColor Yellow
        return 
    }
        
    Write-Host "$($results.Count) matches found." -ForegroundColor Green
    $results
}

CSVSearch -Name Moonknight

If you want the function to stream results as they're found, you can use a Filter, this is a very efficient filtering technique, certainly faster than Where-Object:

function CSVSearch {
    [cmdletbinding()]
    param(
        [Parameter(Mandatory)]
        [string] $Name,

        [Parameter()]
        [string] $Path = 'Path\to\root\folder\'
    )

    begin {
        $param = @{
            File1 = @{ Header = 'A', 'Name', 'C', 'D'; Delimiter = '|' }
            File2 = @{ Header = 'A', 'B', 'Name' ; Delimiter = '|' }
            File3 = @{}; File4 = @{} # File3 & 4 should have headers ?
        }
        $counter = [ref] 0
        filter myFilter {
            if($_.Name -match $name) {
                $counter.Value  
                $_ | Select-Object *, @{N='FileName';E={$file}}
            }
        }
    }
    process {
        foreach($file in Get-ChildItem $path -Filter *.csv) {
            $thisparam = $param[$file.BaseName]
            $thisparam['LiteralPath'] = $file.FullName
            Import-Csv @thisparam | myFilter
        }
    }
    end {
        if(-not $counter.Value) {
            Write-Host 'No matches found.' -ForegroundColor Yellow
            return 
        }
        Write-Host "$($counter.Value) matches found." -ForegroundColor Green
    }
}
  • Related