I need to read 10K files, search the files line by line, for the string of characters after the word SUFFIX
. Once I capture that string I need to remove all traces of it from the file then re-save the file.
With the example below - I would capture -4541
. Then I would replace all occurrences of -4541
with NULL.
Once I replace all the occurrences I then save the changes.
Here is my Data:
ABSDOMN VER 1 D SUFFIX -4541
05 ST-CTY-CDE-FMHA-4541
10 ST-CDE-FMHA-4541 9(2)
10 CTY-CDE-FMHA-4541 9(3)
05 NME-CTY-4541 X(20)
05 LST-UPDTE-DTE-4541 9(06)
05 FILLER X
Here is a starting script. I can Display the line that has the word SUFFIX but I cannot capture the string after it. In this case -4541
.
$CBLFileList = Get-ChildItem -Path "C:\IDMS" -File -Recurse
$regex = "\bSUFFIX\b"
$treat = $false
ForEach($CBLFile in $CBLFileList) {
Write-Host "Processing .... $CBLFile" -foregroundcolor green
Get-content -Path $CBLFile.FullName |
ForEach-Object {
if ($_ -match $regex) {
Write-Host "Found Match - $_" -foregroundcolor green
$treat=$true
}
}
CodePudding user response:
Try the following:
- Note: Be sure to make backup copies of the input files first, as they will be updated in place. Use
-Encoding
withSet-Content
to specify the desired encoding, if it should be different fromSet-Content
's default.
$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse
$regex = '(?<=SUFFIX) -\d '
ForEach ($CBLFile in $CBLFileList) {
$firstLine, $remainingLines = $CBLFile | Get-Content
if ($firstLine -cmatch $regex) {
$toRemove = $Matches[0].Trim()
& { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
Set-Content -LiteralPath $CBLFile.FullName
}
}
Based on your feedback, the regex that worked for you in the end was (?<=SUFFIX).*$
(which could be simplified to (?<=SUFFIX).
in this case), i.e. one that captures whatever follows substring SUFFIX
, instead of only capturing a space followed by a -
and one or more digits (\d
).