Home > Net >  Powershell - capture all non-blank lines between key words
Powershell - capture all non-blank lines between key words

Time:02-11

I have a script that was put together with the community help and it works great. I'm trying to alter it to work for my new requirements. I have data (See below) that needs to be captured from 10K files. Inside each file could be multiple iterations of the data I need to capture. The script I'm using captured only the file name and not the data I need. Can someone see where I'm off (Code wise) and offer another option.

Here is my Raw Data. There could be four or five places in a text member with this data and I need to capture it all. The words in Bold (Select and From) are the words I'm keying on. I need to capture those lines as well as all the lines between them.

    SELECT   FIL_ID_NUM,                                             00481002
             LAU_CRE_DTS,                                            00482002
             LAU_CD,                                                 00483002
             LAU_EFF_DT,                                             00484002
             LAU_EFF_TIM,                                            00485002
             LAU_EXP_DT,                                             00486002
             LAU_EXP_TIM,                                            00487002
             LAU_SERV_REAS1_CD,                                      00488002
             LAU_SERV_REAS2_CD,                                      00489002
             LAU_SERV_REAS3_CD,                                      00489102
             LAU_CARE_IND,                                           00489202
             SEG_UPDT_LOCK_NUM,                                      00489302
             LAU_AGRMT_PER_NAM,                                      00489402
             LAU_AGRMT_DSGT_NAM,                                     00489502
             LAU_AGRMT_DIR_TXT,                                      00489602
             LAU_TERMS_TXT                                           00489702
                                                                     00520002
        FROM TLEGAL

Here is my script....

# create a List object to collect the 'flowerbox' strings in
$Flowerbox = [System.Collections.Generic.List[string]]::new()
$treat = $false

# get a list of the .pl1m files and loop through. Collect all output in variable $result
$CBLFileList = Get-ChildItem -Path 'C:\src' -Filter '*.pl1m' -File -Recurse
$result = foreach ($CBLFile in $CBLFileList) {
    Write-Host "Processing ... $($CBLFile.Name)" -ForegroundColor Green
    
    $Flowerbox.Clear()  # empty the list for the next run
    $treat = $false
    switch -Regex -File $CBLFile.FullName {
        'SELECT' { 
        Write-Host "Found Select" -ForegroundColor DarkYellow
            # start collecting lines from here    
            $treat = $true
        }
        'FROM ' {
        Write-Host "Found From" -ForegroundColor DarkRed
           if ($treat) {
           
            # stop colecting Flowerbox lines and output what we already have
            # output an object with the two properties you need
            [PsCustomObject]@{
                Program     = $CBLFile.Name  # or $CBLFile.FullName
                Description = $Flowerbox -join [environment]::NewLine
            }
            }
            $Flowerbox.Clear()  # empty the list for the next run
            $treat = $false
        }
        default {
                # Add captures data here
                if ($treat -and ($_ -match '/[^\s\\]/')) { 
                $Flowerbox.Add($Matches[1])
            }
        }
    }
}

# now you have everything in an array of PSObjects so you can save that as Csv
$result | Export-Csv -Path 'C:\src\SQL_Desc.csv' -UseCulture -NoTypeInformation

CodePudding user response:

Your script needs only a few tweaks:

  • Add the full line, reflected in $_, to your $Flowerbox list
  • Also do so in the 'SELECT' and 'FROM ' branches.
# ...
$Flowerbox.Clear()  # empty the list for the next run
$treat = $false
switch -Regex -CaseSensitive -File $CBLFile.FullName {
  '^\s*SELECT\s' { 
    Write-Host "Found Select" -ForegroundColor DarkYellow
    # start collecting lines from here    
    $treat = $true
    $Flowerbox.Add($_) # Add this line to the list.
    continue # So that the FROM branch needn't also be checked.
  }
  '^\s*FROM\s' {
    Write-Host "Found From" -ForegroundColor DarkRed
    if ($treat) {
      $Flowerbox.Add($_)  # Add this line to the list.
      # stop colecting Flowerbox lines and output what we already have
      # output an object with the two properties you need
      [PsCustomObject]@{
        Program     = $CBLFile.Name  # or $CBLFile.FullName
        Description = $Flowerbox -join [environment]::NewLine
      }
      $Flowerbox.Clear()  # empty the list for the next run
      $treat = $false
    }
  }
  default {
    if ($treat) {
      $Flowerbox.Add($_)  # Add this line to the list.
    }
  }
}
# ...

Note that, in the interest of minimizing the risk of false positives:

  • -CaseSensitive was added to the switch statement for case-sensitive matching.

  • The regexes were made more robust to prevent accidental substring matching and to tolerate whitespace variations (spaces vs. tabs).

CodePudding user response:

You could try using this function to find all those files that contain those fragments starting on SELECT and ending on FROM, the function would output objects with the captured fragment (Value), Index, Path and Length. The regex is likely to be improved, but this worked for me with some test files.

NOTE: this answer requires that the files fit in memory, as the functions reads the files as a whole string.

using namespace System.IO

function Find-String {
param(
    [parameter(ValueFromPipeline, Mandatory)]
    [Alias('PSPath')]
    [FileInfo]$Path,
    [parameter(Mandatory, Position = 0)]
    [regex]$Pattern,
    [switch]$AllMatches
)

    process {
        $content = [File]::ReadAllText($Path)
        $match = if($AllMatches.IsPresent) {
            $Pattern.Matches($content)
        }
        else {
            $Pattern.Match($content)
        }

        if($match.Success -notcontains $true) {
            return
        }

        foreach($m in $match) {
            [pscustomobject]@{
                Path   = $path.FullName
                Value  = $m.Value
                Index  = $m.Index
                Length = $m.Length
            }
        }
    }
}

Get-ChildItem -Path 'C:\src' -Filter '*.pl1m' -Recurse |
    Find-String -Pattern '(?msi)SELECT.*?FROM.*?$' -AllMatches |
    Export-Csv path\to\export.csv -NoTypeInformation

Output with one particular file looks like this:

Path   : /home/user/Documents/test.txt
Value  : SELECT  FIL_ID_NUM,             00481002
         ...
         ...
         ...
         FROM SOMETABLE
Index  : 11
Length : 449

Path   : /home/user/Documents/test.txt
Value  : SELECT  FIL_ID_NUM,             00481002
         ...
         ...
         ...
         FROM SOMEOTHERTABLE
Index  : 501
Length : 522
  • Related