I need to split a text file that contains 10K lines, some of which contain information about files. Each line of interest contains the string "VER"
in position (column) 68-70 and the name of the file - which is the information I'm trying to extract - is found in position 2-9.
It looks like this...
the file name is ACCRLINK
ACCRLINK VER 1 176 D 03/09/98 02/21/84
I have a script that will split a file but it is rudimentary and I'm unsure how to change it to fit my new needs. The below script will look for a match on NEWTEXT= and then take the next string after the "=" and make that the file name.
However, this substring-based approach does not work.
Can anyone help me alter the script to select by position and capture the file name in another position?
Thank you,
-Ron
$InputFile = "C:\RECORDS_cpy.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)
#$a = 1
$OPName = @()
While (($Line = $Reader.ReadLine()) -ne $null) {
If ($Line -match "NEWTEXT=") {
$OPName = $Line.Split("=")
$FileName = $OPName[1].Trim()
Write-Host "Found ... $FileName" -foregroundcolor green
$OutputFile = "$FileName.txt"
#$a
}
Add-Content $OutputFile $Line
}
CodePudding user response:
I suggest using a switch
statement, which offers both convenient and fast line-by-line reading of files via -File
and regex-matching via -Regex
:
& {
switch -CaseSensitive -Regex -File "C:\RECORDS_cpy.txt" {
'^.(.{8}).{58}VER' { $Matches[1] '.txt' }
}
} | Set-Content $OutputFile
Note that a single Set-Content
call is used to write all output produced by the switch
statement to a file, which is more efficient that multiple Add-Content
calls. If you really meant to append to preexisting $OutputFile
content, replace Set-Content
with Add-Content
above.
Per your later feedback, you're looking for all lines that contain the string
VER
at the (1
-based) column position68
on each line, and, for matching lines only, extract the filename from column positions2
-9
(8
chars. starting in column2
).Note the use of capture group
(.{8})
in the regex, which captures the 8 characters assumed to be the file name, and makes the captured text available in index[1]
of the automatic$Matches
variable.
CodePudding user response:
This code ignores the position issues and looks entirely at the pattern. You need the lines that have "NEWTEXT" with an "=" followed by the desired text and then followed by VER and maybe some other random text.
function GetFileNames([string]$FileName) {
switch -Regex -File $FileName {
'^\s*NEWTEXT\s*=\s*(?<File>.*?)\s*VER\s*.*$' {$Matches.File}
default {continue}
}
}
$InputFile = "C:\RECORDS_cpy.txt"
$OutFile = "C:\RECORDS_Results.txt"
GetFileNames $InputFile | Out-File $OutFile
When ran, the file C:\RECORDS_Results.txt contains this:
ACCRLINK