My file temp.txt looks like below
00ABC
PQR123400
00XYZ001234
012345
0012233
I want to split the file based on pattern '\r\n00'. In this case temp.txt should split into 3 files
first.txt:
00ABC
PQR123400
second.txt
00XYZ001234
012345
third.txt
0012233
I am trying to use csplit to match pattern '\r\n00' but the debug shows me invalid pattern. Can someone please help me to match the exact pattern using csplit
CodePudding user response:
With your shown samples, please try following awk
code. Written and tested in GNU awk
.
This code will create files with names like: 1.txt
, 2.txt
and so on in your system. This will also take care of closing output files in backend so that we don't get in-famous error too many files opened
one.
awk -v RS='\r?\n00' -v count="1" '
{
outputFile=(count ".txt")
rt=RT
sub(/\r?\n/,"",rt)
if(!rt){
sub(/\n /,"")
rt=prevRT
}
printf("%s%s\n",(count>2?rt:""),$0) > outputFile
close(outputFile)
prevRT=rt
}
' Input_file
Explanation: Adding detailed explanation for above code.
awk -v RS='\r?\n00' -v count="1" ' ##Starting awk program from here and setting RS as \r?\n00 aong with that setting count as 1 here.
{
outputFile=(count ".txt") ##Creating outputFile which has value of count(increases each time cursor comes here) followed by .txt here.
rt=RT ##Setting RT value to rt here.
sub(/\r?\n/,"",rt) ##Substituting \r?\n with NULL in rt.
if(!rt){ ##If rt is NULL then do following.
sub(/\n /,"") ##Substituting new lines 1 or more with NULL.
rt=prevRT ##Setting preRT to rt here.
}
printf("%s%s\n",(count>2?rt:""),$0) > outputFile ##Printing rt and current line into outputFile.
close(outputFile) ##Closing outputFile in backend.
prevRT=rt ##Setting rt to prevRT here.
}
' Input_file ##Mentioning Input_file name here.