Home > database >  Removing block of code is the string is found in entire block using powershell
Removing block of code is the string is found in entire block using powershell

Time:03-24

I have a file DJ.bat file. I am reading the content of file using powershell.

DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_2554_20220323065246.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_2554_20220323065246.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"
DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_5168_20220323074029.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_5168_20220323074029.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"
DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"

I have thousand of -DJENGINE lines of statement being printed. I have just mentioned three here

I have an array $data which has thousands of data. I just mentioned two here:

TMSfuture_cost_2554_20220323065246.dat
TMSfuture_cost_5168_20220323074029.dat

I want result that the content of array are compared within the content of file being read. If the ,content of array matches then I need to delete block of code from :

   " DJEngine.... to .... tf.xml"

Since these two arrays matches with file content, My expected output is:

DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"

I tried using:

$pathOFDj='Y:\St\Retail\FTP\2022-03-23\DJ.bat'
foreach($d in $data){
  foreach($line in Get-Content $pathOFDj) {
    if($line -contains $d){

    }
    else{
      $newLine =$line
    }
  }
}

echo $newLine

The block which I mentioned is not being removed.

CodePudding user response:

Your current script has 3 major problems:

  • Wrong comparison operator - -contains is for testing collection containment, not substrings - you'd want something like -like or -match for string comparisons instead
  • Repeated positives - even if one string from $data is found in a specific line, the line will still be copied/included on the next iteration of the outer loop because it won't contain the remaining $data substrings
  • Subquadratic time complexity - testing every single item in $data against every single line in the file gives your script a bounding time complexity of O(N*M) where N is the number of $data items and M is the number of lines in the file. This means your code is going to get significantly slower when you increase the input size. By structuring your code differently this can be improved somewhat, and by structuring you data differently this can be improved massively

Instead of attempting to solve these problems point-by-point, I'm gonna show you how to prepare the $data array and parse the input file for better performance (and correctness of course).

This will consist of two steps:

  • Organize the $data items into a data structure that allows for constant-time lookups - something that can tell us, as quickly as possible, whether a specific string is part of the collection or not.
  • Parse the relevant file name out of each line in the file, use the extracted file name to test if the collection from the previous set contains it, and use that to filter out the relevant line
$pathOFDj='Y:\St\Retail\FTP\2022-03-23\DJ.bat'

# Read file names into array
$data = Get-Content path\to\listOfFileNames.txt

# Create a hashset - this will provide super-fast lookups
$fileNameSet = [System.Collections.Generic.HashSet[string]]::new([System.StringComparer]::CurrentCultureIgnoreCase)
$data |ForEach-Object { [void]$fileNameSet.Add($_) }

# Now we can start parsing and filtering the file
Get-Content $pathOFDj |Where-Object {
  # attempt to extract file name, then use the extracted file name to test if it's one of the relevant file names
  -not($_ -match '-sc "File=[^"] ?\\([^\\"] )"' -and $fileNameSet.Contains($Matches[1]))
} |Set-Content path\to\modified_dj.bat

The statement -not($_ -match '-sc "File=[^"] \\([^\\"] )"' -and $fileNameSet.Contains($Matches[1])) will only evaluate to $false if a filename was successfully extracted and found to be contained in the hashset - otherwise, it'll evaluate to $true, and Where-Object will let the line filter through as expected.

  • Related