Home > Back-end >  Powershell: Split a single file into multiple files - using string match criteria
Powershell: Split a single file into multiple files - using string match criteria

Time:02-24

I have a single file that contains 1GB worth of data. This data is actually 10's of thousands of individual mini files. I need to extract each individual file and place them in their own separate Distinct file. So essentially, I need to go from a single file to 30K separate files.

Here is a sample of what My file looks like.

FILENAM1 VER 1 32 D
10/15/87 09/29/87
PREPARED BY ?????
REVISED BY ?????
DESCRIPTION USER DOMAIN
RECORD FILENAM1 VER 1 D SUFFIX -4541
100 05 ST-CTY-CDE-FMHA-4541 DISPLAY
200 10 ST-CDE-FMHA-4541 9(2) DISPLAY
300 10 CTY-CDE-FMHA-4541 9(3) DISPLAY
400 05 NME-CTY-4541 X(20) DISPLAY
500 05 LST-UPDTE-DTE-4541 9(06) DISPLAY
600 05 FILLER X DISPLAY 1REPORT NO. 08
DATA DICTIONARY REPORTER REL 17.0 09/23/21
PAGE 2 DREPORT 008
RECORD REPORT

-************************************************************************************************************************************ RECORD RECORD ---- D A T E ----
RECORD NAME LENGTH BUILDER TYPE OCCURRENCES UPDATED CREATED
************************************************************************************************************************************ 0
FILENAM2 VER 1 176 D
03/09/98 02/21/84
PREPARED BY ??????
REVISED BY ??????
DEFINITION

I Need split the files out based upon a match of VER in position 68, 69 and 70. I also need to name each file uniquely. That information is stored on the same line in position 2-9. In the example above that string is "FILENAM1" and FILENAM2".

So just using the example above I would create two output files and they would be named FILENAM1.txt and FILENAM2.txt.

Since I have 30K files I need to split, doing this manually is impossible.

I do have a script that will split a file into multiple files but it will not search for strings by position.

Would anyone be able to assist me with this?

Here is script that DOES NOT Work. Hopefully I can butcher it and get some valid results....

$InputFile = "C:\COPIES.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)
$OPName = @()
While (($Line = $Reader.ReadLine()) -ne $null) {
    If ($Line -match "VER"(67,3)) {
        $OPName = $Line.(2,8)
        $FileName = $OPName[1].Trim()
        Write-Host "Found ... $FileName" -foregroundcolor green
        $OutputFile = "$FileName.txt"
    }    
    Add-Content $OutputFile $Line
}
                        

Thank you in advance,

-Ron

CodePudding user response:

I suggest using a switch statement, which offers both convenient and fast line-by-line reading of files via -File and regex-matching via -Regex:

$streamWriter = $null
switch -CaseSensitive -Regex -File "C:\COPIES.txt" {
  '^.(.{8}).{58}VER' { # Start of a new embedded file.
    if ($streamWriter) { $streamWriter.Close() } # Close previous output file.
    # Create a new output file.
    $fileName = $Matches[1].Trim()   '.txt'
    $streamWriter = [System.IO.StreamWriter] (Join-Path $PWD.ProviderPath $fileName)
    $streamWriter.WriteLine($_)
  }
  default { # Write subsequent lines to the same file.
    if ($streamWriter) { $streamWriter.WriteLine($_) }
  }
}
$streamWriter.Close()

Note: A solution using the .Substring() method of the [string] type is possible too, but would be more verbose.

  • The ^.(.{8}).{58} portion of the regex matches the first 67 characters on each line, while capturing those in (1-based) columns 2 through 9 (the file name) via capture group (.{8}), which makes the captured text available in index [1] of the automatic $Matches variable. The VER portion of the regex then ensures that the line only matches if VER is found at column position 68.

  • For efficient output-file creation, [System.IO.StreamWriter] instances are used, which is much faster than line-by-line Add-Content calls. Additionally, with Add-Content you'd have to ensure that a target file doesn't already exist, as the existing content would then be appended to.

  • Related