I have a very large text file 400K lines. I need to separate it into multiple files. I currently have it separate by the 3rd ([2]) column and set the text file to that name, but those files are still too large. I would like to separate by name and month if possible.
File looks like this:
7/1/2022 9:00 7/1/2022 10:00 25823-0001
7/1/2022 10:00 7/1/2022 11:00 25823-0001 ...
7/1/2022 11:00 7/1/2022 12:00 25823-0001 ...
7/1/2022 12:00 7/1/2022 13:00 25823-0001 ...
7/2/2022 13:00 7/2/2022 14:00 25823-0001 ...
7/2/2022 14:00 7/2/2022 15:00 25823-0001 ...
7/2/2022 15:00 7/2/2022 16:00 25823-0001 ...
7/2/2022 16:00 7/2/2022 17:00 25823-0001 ...
7/2/2022 17:00 7/2/2022 18:00 25823-0001 ...
8/1/2022 18:00 8/1/2022 19:00 25823-0001 ...
8/1/2022 19:00 8/1/2022 20:00 25823-0001 ...
8/1/2022 20:00 8/1/2022 21:00 25823-0001 ...
8/1/2022 21:00 8/1/2022 22:00 25823-0001 ...
8/1/2022 22:00 8/1/2022 23:00 25823-0001 ...
8/1/2022 23:00 8/2/2022 0:00 25823-0001 ...
8/2/2022 0:00 8/2/2022 1:00 25823-0001 ...
8/2/2022 1:00 8/2/2022 2:00 25823-0001 ...
So the file would end up being 25823-0001_202207.txt and contain
7/1/2022 9:00 7/1/2022 10:00 25823-0001
7/1/2022 10:00 7/1/2022 11:00 25823-0001 ...
7/1/2022 11:00 7/1/2022 12:00 25823-0001 ...
7/1/2022 12:00 7/1/2022 13:00 25823-0001 ...
7/2/2022 13:00 7/2/2022 14:00 25823-0001 ...
7/2/2022 14:00 7/2/2022 15:00 25823-0001 ...
7/2/2022 15:00 7/2/2022 16:00 25823-0001 ...
7/2/2022 16:00 7/2/2022 17:00 25823-0001 ...
7/2/2022 17:00 7/2/2022 18:00 25823-0001 ...
25823-0001_202208.txt would contain
8/1/2022 18:00 8/1/2022 19:00 25823-0001 ...
8/1/2022 19:00 8/1/2022 20:00 25823-0001 ...
8/1/2022 20:00 8/1/2022 21:00 25823-0001 ...
8/1/2022 21:00 8/1/2022 22:00 25823-0001 ...
8/1/2022 22:00 8/1/2022 23:00 25823-0001 ...
8/1/2022 23:00 8/2/2022 0:00 25823-0001 ...
8/2/2022 0:00 8/2/2022 1:00 25823-0001 ...
8/2/2022 1:00 8/2/2022 2:00 25823-0001 ...
I currently use this code to split the files:
foreach($file in $source){
$fileIn = New-Object IO.StreamReader $file
#$header = $fileIn.ReadLine()
while ($line = $fileIn.ReadLine())
{
$newFile = "$(($line -split "\t")[2]).txt"
if ($newFile -ne $currentFile)
{...
I tried editing the code to this:
$newFile = "$(($line -split "\t")[2,0]).txt"
but obviously, that wont work due to the format of the date field.
Update The below solution seems to work, but I still get this error message:
Exception calling "ParseExact" with "3" argument(s): "String was not recognized as a valid DateTime."
I tried adding .trim() to the $fields[0] line thinking maybe there was whitespace, but that did not work.
CodePudding user response:
Split the line into a variable first, then use the DateTime
class to parse and reformat the date:
$fields = $line -split '\t'
$namePart1 = $fields[2]
$date = [DateTime]::ParseExact($fields[0], 'M\/d\/yyyy H\:m', [CultureInfo]::InvariantCulture)
$namePart2 = $date.ToString('yyyyMM')
$newFile = "${namePart1}_$namePart2"
- In the
ParseExact()
call, the 2nd argument specifies a custom date and time format. Certain characters like/
and:
are special characters which must be\
-escaped to use them as literal characters. - In
"${namePart1}_$namePart2"
, the curly braces are used to make sure the underscore is not interpreted as part of the variable name.