Home > Mobile >  How can I split a large tab delimited text file into separate files by date field and use the date i
How can I split a large tab delimited text file into separate files by date field and use the date i

Time:08-17

I have a very large text file 400K lines. I need to separate it into multiple files. I currently have it separate by the 3rd ([2]) column and set the text file to that name, but those files are still too large. I would like to separate by name and month if possible.

File looks like this:

7/1/2022 9:00   7/1/2022 10:00  25823-0001
7/1/2022 10:00  7/1/2022 11:00  25823-0001 ...
7/1/2022 11:00  7/1/2022 12:00  25823-0001 ...
7/1/2022 12:00  7/1/2022 13:00  25823-0001 ...
7/2/2022 13:00  7/2/2022 14:00  25823-0001 ...
7/2/2022 14:00  7/2/2022 15:00  25823-0001 ...
7/2/2022 15:00  7/2/2022 16:00  25823-0001 ...
7/2/2022 16:00  7/2/2022 17:00  25823-0001 ...
7/2/2022 17:00  7/2/2022 18:00  25823-0001 ...
8/1/2022 18:00  8/1/2022 19:00  25823-0001 ...
8/1/2022 19:00  8/1/2022 20:00  25823-0001 ...
8/1/2022 20:00  8/1/2022 21:00  25823-0001 ...
8/1/2022 21:00  8/1/2022 22:00  25823-0001 ...
8/1/2022 22:00  8/1/2022 23:00  25823-0001 ...
8/1/2022 23:00  8/2/2022 0:00   25823-0001 ...
8/2/2022 0:00   8/2/2022 1:00   25823-0001 ...
8/2/2022 1:00   8/2/2022 2:00   25823-0001 ...

So the file would end up being 25823-0001_202207.txt and contain

7/1/2022 9:00   7/1/2022 10:00  25823-0001
7/1/2022 10:00  7/1/2022 11:00  25823-0001 ...
7/1/2022 11:00  7/1/2022 12:00  25823-0001 ...
7/1/2022 12:00  7/1/2022 13:00  25823-0001 ...
7/2/2022 13:00  7/2/2022 14:00  25823-0001 ...
7/2/2022 14:00  7/2/2022 15:00  25823-0001 ...
7/2/2022 15:00  7/2/2022 16:00  25823-0001 ...
7/2/2022 16:00  7/2/2022 17:00  25823-0001 ...
7/2/2022 17:00  7/2/2022 18:00  25823-0001 ...

25823-0001_202208.txt would contain

8/1/2022 18:00  8/1/2022 19:00  25823-0001 ...
8/1/2022 19:00  8/1/2022 20:00  25823-0001 ...
8/1/2022 20:00  8/1/2022 21:00  25823-0001 ...
8/1/2022 21:00  8/1/2022 22:00  25823-0001 ...
8/1/2022 22:00  8/1/2022 23:00  25823-0001 ...
8/1/2022 23:00  8/2/2022 0:00   25823-0001 ...
8/2/2022 0:00   8/2/2022 1:00   25823-0001 ...
8/2/2022 1:00   8/2/2022 2:00   25823-0001 ...

I currently use this code to split the files:

foreach($file in $source){
  $fileIn = New-Object IO.StreamReader $file
  #$header = $fileIn.ReadLine()
while ($line = $fileIn.ReadLine())
  {
    $newFile = "$(($line -split "\t")[2]).txt"
    if ($newFile -ne $currentFile)
      {...

I tried editing the code to this:

$newFile = "$(($line -split "\t")[2,0]).txt"

but obviously, that wont work due to the format of the date field.

Update The below solution seems to work, but I still get this error message:

Exception calling "ParseExact" with "3" argument(s): "String was not recognized as a valid DateTime."

I tried adding .trim() to the $fields[0] line thinking maybe there was whitespace, but that did not work.

CodePudding user response:

Split the line into a variable first, then use the DateTime class to parse and reformat the date:

$fields    = $line -split '\t'
$namePart1 = $fields[2]
$date      = [DateTime]::ParseExact($fields[0], 'M\/d\/yyyy H\:m', [CultureInfo]::InvariantCulture)
$namePart2 = $date.ToString('yyyyMM')
$newFile   = "${namePart1}_$namePart2"
  • In the ParseExact() call, the 2nd argument specifies a custom date and time format. Certain characters like / and : are special characters which must be \-escaped to use them as literal characters.
  • In "${namePart1}_$namePart2", the curly braces are used to make sure the underscore is not interpreted as part of the variable name.
  • Related