I have a script that was put together with the community help and it works great. I'm trying to alter it to work for my new requirements. I have data (See below) that needs to be captured from 10K files. Inside each file could be multiple iterations of the data I need to capture. The script I'm using captured only the file name and not the data I need. Can someone see where I'm off (Code wise) and offer another option.
Here is my Raw Data. There could be four or five places in a text member with this data and I need to capture it all. The words in Bold (Select and From) are the words I'm keying on. I need to capture those lines as well as all the lines between them.
SELECT FIL_ID_NUM, 00481002
LAU_CRE_DTS, 00482002
LAU_CD, 00483002
LAU_EFF_DT, 00484002
LAU_EFF_TIM, 00485002
LAU_EXP_DT, 00486002
LAU_EXP_TIM, 00487002
LAU_SERV_REAS1_CD, 00488002
LAU_SERV_REAS2_CD, 00489002
LAU_SERV_REAS3_CD, 00489102
LAU_CARE_IND, 00489202
SEG_UPDT_LOCK_NUM, 00489302
LAU_AGRMT_PER_NAM, 00489402
LAU_AGRMT_DSGT_NAM, 00489502
LAU_AGRMT_DIR_TXT, 00489602
LAU_TERMS_TXT 00489702
00520002
FROM TLEGAL
Here is my script....
# create a List object to collect the 'flowerbox' strings in
$Flowerbox = [System.Collections.Generic.List[string]]::new()
$treat = $false
# get a list of the .pl1m files and loop through. Collect all output in variable $result
$CBLFileList = Get-ChildItem -Path 'C:\src' -Filter '*.pl1m' -File -Recurse
$result = foreach ($CBLFile in $CBLFileList) {
Write-Host "Processing ... $($CBLFile.Name)" -ForegroundColor Green
$Flowerbox.Clear() # empty the list for the next run
$treat = $false
switch -Regex -File $CBLFile.FullName {
'SELECT' {
Write-Host "Found Select" -ForegroundColor DarkYellow
# start collecting lines from here
$treat = $true
}
'FROM ' {
Write-Host "Found From" -ForegroundColor DarkRed
if ($treat) {
# stop colecting Flowerbox lines and output what we already have
# output an object with the two properties you need
[PsCustomObject]@{
Program = $CBLFile.Name # or $CBLFile.FullName
Description = $Flowerbox -join [environment]::NewLine
}
}
$Flowerbox.Clear() # empty the list for the next run
$treat = $false
}
default {
# Add captures data here
if ($treat -and ($_ -match '/[^\s\\]/')) {
$Flowerbox.Add($Matches[1])
}
}
}
}
# now you have everything in an array of PSObjects so you can save that as Csv
$result | Export-Csv -Path 'C:\src\SQL_Desc.csv' -UseCulture -NoTypeInformation
CodePudding user response:
Your script needs only a few tweaks:
- Add the full line, reflected in
$_
, to your$Flowerbox
list - Also do so in the
'SELECT'
and'FROM '
branches.
# ...
$Flowerbox.Clear() # empty the list for the next run
$treat = $false
switch -Regex -CaseSensitive -File $CBLFile.FullName {
'^\s*SELECT\s' {
Write-Host "Found Select" -ForegroundColor DarkYellow
# start collecting lines from here
$treat = $true
$Flowerbox.Add($_) # Add this line to the list.
continue # So that the FROM branch needn't also be checked.
}
'^\s*FROM\s' {
Write-Host "Found From" -ForegroundColor DarkRed
if ($treat) {
$Flowerbox.Add($_) # Add this line to the list.
# stop colecting Flowerbox lines and output what we already have
# output an object with the two properties you need
[PsCustomObject]@{
Program = $CBLFile.Name # or $CBLFile.FullName
Description = $Flowerbox -join [environment]::NewLine
}
$Flowerbox.Clear() # empty the list for the next run
$treat = $false
}
}
default {
if ($treat) {
$Flowerbox.Add($_) # Add this line to the list.
}
}
}
# ...
Note that, in the interest of minimizing the risk of false positives:
-CaseSensitive
was added to theswitch
statement for case-sensitive matching.The regexes were made more robust to prevent accidental substring matching and to tolerate whitespace variations (spaces vs. tabs).
CodePudding user response:
You could try using this function to find all those files that contain those fragments starting on SELECT
and ending on FROM
, the function would output objects with the captured fragment (Value
), Index
, Path
and Length
. The regex is likely to be improved, but this worked for me with some test files.
NOTE: this answer requires that the files fit in memory, as the functions reads the files as a whole string.
using namespace System.IO
function Find-String {
param(
[parameter(ValueFromPipeline, Mandatory)]
[Alias('PSPath')]
[FileInfo]$Path,
[parameter(Mandatory, Position = 0)]
[regex]$Pattern,
[switch]$AllMatches
)
process {
$content = [File]::ReadAllText($Path)
$match = if($AllMatches.IsPresent) {
$Pattern.Matches($content)
}
else {
$Pattern.Match($content)
}
if($match.Success -notcontains $true) {
return
}
foreach($m in $match) {
[pscustomobject]@{
Path = $path.FullName
Value = $m.Value
Index = $m.Index
Length = $m.Length
}
}
}
}
Get-ChildItem -Path 'C:\src' -Filter '*.pl1m' -Recurse |
Find-String -Pattern '(?msi)SELECT.*?FROM.*?$' -AllMatches |
Export-Csv path\to\export.csv -NoTypeInformation
Output with one particular file looks like this:
Path : /home/user/Documents/test.txt
Value : SELECT FIL_ID_NUM, 00481002
...
...
...
FROM SOMETABLE
Index : 11
Length : 449
Path : /home/user/Documents/test.txt
Value : SELECT FIL_ID_NUM, 00481002
...
...
...
FROM SOMEOTHERTABLE
Index : 501
Length : 522