I wrote this script to search a lot of text files (~100,000) for 4 different search criteria and export to 4 separate files, I thought it would be more efficient to perform all 4 searches on each file as it is loaded vs doing 4 full searches like the first iteration below does. I may be missing some other major inefficiencies as I am pretty new to powershell.
I have this script re written from the first version to the second, but can't figure out how to get the path and data to display together like the first version did. I am struggling to reference the object within the loop, and have pieced this second version together, which is working, but not giving me the path to the file which is necessary.
It seems like I am just missing one or two little things to get me going in the right direction. Thanks in advance for your help
1st version:
Get-ChildItem -Filter *.txt -Path "\\file\to\search" -Recurse | Select-String -Pattern "abc123" -Context 0,3 | Out-File -FilePath "\\c:\out.txt"
Get-ChildItem -Filter *.txt -Path "\\file\to\search2" -Recurse | Select-String -Pattern "abc124" -Context 0,3 | Out-File -FilePath "\\c:\out2.txt"
Get-ChildItem -Filter *.txt -Path "\\file\to\search3" -Recurse | Select-String -Pattern "abc125" -Context 0,3 | Out-File -FilePath "\\c:\out3.txt"
Get-ChildItem -Filter *.txt -Path "\\file\to\search4" -Recurse | Select-String -Pattern "abc126" -Context 0,3 | Out-File -FilePath "\\c:\out4.txt"
Output:
\\file\that\was\found\example.txt:84: abc123
\\file\that\was\found\example.txt:90: abc123
\\file\that\was\found\example.txt:91: abc123
2nd version:
##$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ Configuration $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
############################################ Global Parameters #############################################
$SearchPath="\\file\to\search"
$ProgressFile=""\\progress\file\ResultsCount.txt"
$records = 105325
##----------------------------------------- End Global Parameters -----------------------------------------
########################################### Search Parameters ##############################################
##Search Pattern 1
$Pattern1="abc123"
$SaveFile1="\\c:\out.txt"
##Search Pattern 2
$Pattern2="abc124"
$SaveFile2="\\c:\out2.txt"
##Search Pattern 3
$Pattern3= "abc125"
$SaveFile3= "\\c:\out3.txt"
##Search Pattern 4
$Pattern4= "abc126"
$SaveFile4="\\c:\out4.txt"
##Search Pattern 5
$Pattern5= ""
$SaveFile5=""
##----------------------------------------- End Search Parameters ------------------------------------------
##$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ End of Config $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
############################### SCRIPT #####################################################################
## NOTES
## ------
##$files=Get-ChildItem -Filter *.txt -Path $SearchPath -Recurse ## Set all files to variable #### Long running, needs to be a better way #######
##$records=$files.count ## Set record #
Get-ChildItem -Filter *.txt -Path $SearchPath -Recurse | Foreach-Object { ## loop through search folder
$i=$i 1 ## increment record
##
Get-Content $_.FullName | Select-String -Pattern $Pattern1 -Context 0,3 | Out-File -FilePath $SaveFile1 ## pattern1 search
Get-Content $_.FullName | Select-String -Pattern $Pattern2 | Out-File -FilePath $SaveFile2 ## pattern2 search
Get-Content $_.FullName | Select-String -Pattern $Pattern3 -Context 0,1 | Out-File -FilePath $SaveFile3 ## pattern3 search
Get-Content $_.FullName | Select-String -Pattern $Pattern4 -Context 0,1 | Out-File -FilePath $SaveFile4 ## pattern4 search
##Get-Content $_.FullName | Select-String -Pattern $Pattern5 -Context 0,1 | Out-File -FilePath $SaveFile5 ## pattern5 search (Comment out unneeded search lines like this one)
$progress ="Record $($i) of $($records)" ## set progress
Write-Host "Record $($i) of $($records)" ## Writes progress to window
$progress | Out-File -FilePath $ProgressFile ## progress file
} ##
############################################################################################################
Output:
abc123
abc123
abc123
Edit: Also I am trying to figure out a good way to not have to hard code in the number of records for a decent progress readout, I commented out the way I thought would work (1st & 2nd line of the script), but there needs to be a more efficient way than rerunning the same search twice, one for a count and one for the for loop.
I would be very interested in any runtime efficiency information you could provide.
CodePudding user response:
the Select-String
cmdlet will accept a -Path
parameter ... and it is FAR faster than using Get-ChildItem
to feed the files to S-S
. [grin]
also, the -Pattern
parameter accepts a regex OR pattern like Thing|OtherThing|YetAnotherThing
.
also also, the -SimpleMatch
parameter accepts an array of strings to use for matching stuff without wildcards or regex.
what the code does ...
- defines the source dir
- defines the file spec
- joins those two into a wildcard file path
- builds an array of string patterns to use
- calls
Select-String
with a path and an array of strings to search for - uses
Group-Object
and a calculated property to group the matches by the last part of.Line
property from theS-S
call - saves that to a $Var
- shows that on screen
at that point, you can use the .Name
property of each GroupInfo
to select the items to send out to each file AND to build your file names.
the code ...
$SourceDir = 'D:\Temp\zzz - Copy'
$FileSpec = '*.log'
$SD_FileSpec = Join-Path -Path $SourceDir -ChildPath $FileSpec
$TargetPatternList = @(
'Accordion Cajun Zydeco'
'better-not-be-there'
'Piano Rockabilly Rowdy'
)
$GO_Results = Select-String -Path $SD_FileSpec -SimpleMatch $TargetPatternList |
Group-Object -Property {$_.Line.Split(':')[-1]}
$GO_Results
output ...
Count Name Group
----- ---- -----
6 Accordion Cajun Zydeco {D:\Temp\zzz - Copy\Grouping-List_08-02.log:11:Accordion Cajun Zydeco, D:\Temp\zzz - Copy\Grouping-List_08-09.log:11:Accordion Cajun Zy...
6 Bawdy Dupe Piano Rocka... {D:\Temp\zzz - Copy\Grouping-List_08-02.log:108:Bawdy Dupe Piano Rockabilly Rowdy, D:\Temp\zzz - Copy\Grouping-List_08-09.log:108:Bawdy...
6 Bawdy Piano Rockabilly... {D:\Temp\zzz - Copy\Grouping-List_08-02.log:138:Bawdy Piano Rockabilly Rowdy, D:\Temp\zzz - Copy\Grouping-List_08-09.log:138:Bawdy Pian...
6 Dupe Piano Rockabilly ... {D:\Temp\zzz - Copy\Grouping-List_08-02.log:948:Dupe Piano Rockabilly Rowdy, D:\Temp\zzz - Copy\Grouping-List_08-09.log:948:Dupe Piano ...
6 Instrumental Piano Roc... {D:\Temp\zzz - Copy\Grouping-List_08-02.log:1563:Instrumental Piano Rockabilly Rowdy, D:\Temp\zzz - Copy\Grouping-List_08-09.log:1563:I...
6 Piano Rockabilly Rowdy {D:\Temp\zzz - Copy\Grouping-List_08-02.log:1781:Piano Rockabilly Rowdy, D:\Temp\zzz - Copy\Grouping-List_08-09.log:1781:Piano Rockabil...
note that the .Group
contains an array of lines from the matches sent out by the S-S
call. you can send that to your output file.
CodePudding user response:
Here is my take at solving this problem, very similar to Lee_Dailey's nice answer but with a foreach
loop. I would recommend investing some time into researching the multi-threading options available on PowerShell in case you need to increase the performance of the script, you can look specifically at the ThreadJob module by Microsoft which is really easy to use or if you can't install modules due to some work policy, you can use Runspace.
It is worth adding that you can use the -List
switch on Select-String
, this way the performance of the script would be increased even more:
-List
Only the first instance of matching text is returned from each input file. This is the most efficient way to retrieve a list of files that have contents matching the regular expression.
$map = @{
abc123 = 'C:\out_abc123.txt'
abc124 = 'C:\out_abc124.txt'
abc125 = 'C:\out_abc125.txt'
}
$pattern = $map.Keys -join '|'
$match = foreach($file in Get-ChildItem *.txt)
{
Select-String -LiteralPath $file.FullName -Pattern $pattern
}
$match | Group-Object { $_.Matches.Value } | ForEach-Object {
$_.Group | Select-Object Path, LineNumber, Line | Out-File $map[$_.Name]
}