I have a single file that contains 1GB worth of data. This data is actually 10's of thousands of individual mini files. I need to extract each individual file and place them in their own separate Distinct file. So essentially, I need to go from a single file to 30K separate files.
Here is a sample of what My file looks like.
FILENAM1 VER 1 32 D
10/15/87 09/29/87
PREPARED BY ?????
REVISED BY ?????
DESCRIPTION USER DOMAIN
RECORD FILENAM1 VER 1 D SUFFIX -4541
100 05 ST-CTY-CDE-FMHA-4541 DISPLAY
200 10 ST-CDE-FMHA-4541 9(2) DISPLAY
300 10 CTY-CDE-FMHA-4541 9(3) DISPLAY
400 05 NME-CTY-4541 X(20) DISPLAY
500 05 LST-UPDTE-DTE-4541 9(06) DISPLAY
600 05 FILLER X DISPLAY 1REPORT NO. 08
DATA DICTIONARY REPORTER REL 17.0 09/23/21
PAGE 2 DREPORT 008
RECORD REPORT-************************************************************************************************************************************ RECORD RECORD ---- D A T E ----
RECORD NAME LENGTH BUILDER TYPE OCCURRENCES UPDATED CREATED
************************************************************************************************************************************ 0
FILENAM2 VER 1 176 D
03/09/98 02/21/84
PREPARED BY ??????
REVISED BY ??????
DEFINITION
I Need split the files out based upon a match of VER in position 68, 69 and 70. I also need to name each file uniquely. That information is stored on the same line in position 2-9. In the example above that string is "FILENAM1" and FILENAM2".
So just using the example above I would create two output files and they would be named FILENAM1.txt and FILENAM2.txt.
Since I have 30K files I need to split, doing this manually is impossible.
I do have a script that will split a file into multiple files but it will not search for strings by position.
Would anyone be able to assist me with this?
Here is script that DOES NOT Work. Hopefully I can butcher it and get some valid results....
$InputFile = "C:\COPIES.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)
$OPName = @()
While (($Line = $Reader.ReadLine()) -ne $null) {
If ($Line -match "VER"(67,3)) {
$OPName = $Line.(2,8)
$FileName = $OPName[1].Trim()
Write-Host "Found ... $FileName" -foregroundcolor green
$OutputFile = "$FileName.txt"
}
Add-Content $OutputFile $Line
}
Thank you in advance,
-Ron
CodePudding user response:
I suggest using a switch
statement, which offers both convenient and fast line-by-line reading of files via -File
and regex-matching via -Regex
:
$streamWriter = $null
switch -CaseSensitive -Regex -File "C:\COPIES.txt" {
'^.(.{8}).{58}VER' { # Start of a new embedded file.
if ($streamWriter) { $streamWriter.Close() } # Close previous output file.
# Create a new output file.
$fileName = $Matches[1].Trim() '.txt'
$streamWriter = [System.IO.StreamWriter] (Join-Path $PWD.ProviderPath $fileName)
$streamWriter.WriteLine($_)
}
default { # Write subsequent lines to the same file.
if ($streamWriter) { $streamWriter.WriteLine($_) }
}
}
$streamWriter.Close()
Note: A solution using the .Substring()
method of the [string]
type is possible too, but would be more verbose.
The
^.(.{8}).{58}
portion of the regex matches the first 67 characters on each line, while capturing those in (1-based) columns 2 through 9 (the file name) via capture group(.{8})
, which makes the captured text available in index[1]
of the automatic$Matches
variable. TheVER
portion of the regex then ensures that the line only matches ifVER
is found at column position 68.For efficient output-file creation,
[System.IO.StreamWriter]
instances are used, which is much faster than line-by-lineAdd-Content
calls. Additionally, withAdd-Content
you'd have to ensure that a target file doesn't already exist, as the existing content would then be appended to.