I have a file DJ.bat
file. I am reading the content of file using powershell.
DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_2554_20220323065246.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_2554_20220323065246.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"
DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_5168_20220323074029.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_5168_20220323074029.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"
DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"
I have thousand of -DJENGINE
lines of statement being printed. I have just mentioned three here
I have an array $data
which has thousands of data. I just mentioned two here:
TMSfuture_cost_2554_20220323065246.dat
TMSfuture_cost_5168_20220323074029.dat
I want result that the content of array
are compared within the content of file being read
. If the ,content of array matches then I need to delete block of code from :
" DJEngine.... to .... tf.xml"
Since these two arrays
matches with file content
, My expected output is:
DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"
I tried using:
$pathOFDj='Y:\St\Retail\FTP\2022-03-23\DJ.bat'
foreach($d in $data){
foreach($line in Get-Content $pathOFDj) {
if($line -contains $d){
}
else{
$newLine =$line
}
}
}
echo $newLine
The block which I mentioned is not being removed.
CodePudding user response:
Your current script has 3 major problems:
- Wrong comparison operator -
-contains
is for testing collection containment, not substrings - you'd want something like-like
or-match
for string comparisons instead - Repeated positives - even if one string from
$data
is found in a specific line, the line will still be copied/included on the next iteration of the outer loop because it won't contain the remaining$data
substrings - Subquadratic time complexity - testing every single item in
$data
against every single line in the file gives your script a bounding time complexity ofO(N*M)
whereN
is the number of$data
items andM
is the number of lines in the file. This means your code is going to get significantly slower when you increase the input size. By structuring your code differently this can be improved somewhat, and by structuring you data differently this can be improved massively
Instead of attempting to solve these problems point-by-point, I'm gonna show you how to prepare the $data
array and parse the input file for better performance (and correctness of course).
This will consist of two steps:
- Organize the
$data
items into a data structure that allows for constant-time lookups - something that can tell us, as quickly as possible, whether a specific string is part of the collection or not. - Parse the relevant file name out of each line in the file, use the extracted file name to test if the collection from the previous set contains it, and use that to filter out the relevant line
$pathOFDj='Y:\St\Retail\FTP\2022-03-23\DJ.bat'
# Read file names into array
$data = Get-Content path\to\listOfFileNames.txt
# Create a hashset - this will provide super-fast lookups
$fileNameSet = [System.Collections.Generic.HashSet[string]]::new([System.StringComparer]::CurrentCultureIgnoreCase)
$data |ForEach-Object { [void]$fileNameSet.Add($_) }
# Now we can start parsing and filtering the file
Get-Content $pathOFDj |Where-Object {
# attempt to extract file name, then use the extracted file name to test if it's one of the relevant file names
-not($_ -match '-sc "File=[^"] ?\\([^\\"] )"' -and $fileNameSet.Contains($Matches[1]))
} |Set-Content path\to\modified_dj.bat
The statement -not($_ -match '-sc "File=[^"] \\([^\\"] )"' -and $fileNameSet.Contains($Matches[1]))
will only evaluate to $false
if a filename was successfully extracted and found to be contained in the hashset - otherwise, it'll evaluate to $true
, and Where-Object
will let the line filter through as expected.