I have a text file structured like this:
[timestamp1] header with space
[timestamp2] data1
[timestamp3] data2
[timestamp4] data3
[timestamp5] ..
[timestamp6] footer with space
[timestamp7] junk
[timestamp8] header with space
[timestamp9] data4
[timestamp10] data5
[timestamp11] ...
[timestamp12] footer with space
[timestamp13] junk
[timestamp14] header with space
[timestamp15] data6
[timestamp16] data7
[timestamp17] data8
[timestamp18] ..
[timestamp19] footer with space
I need to find each part between header
and footer
and save it in another file. For example the file1 should contain (with or without timestamps; doesn't matter):
data1
data2
data3
..
and the next pack should be saved as file2 and so on. This seems like a routine process, but I haven't find a solution yet.
I have this sed command that finds the first packet.
sed -n "/header/,/footer/{p;/footer/q}" file
But I don't know how to iterate that over the next matches. Maybe I should delete the first match after copying it to another file and repeat the same command
CodePudding user response:
A very naive approach, coded fast, could be improved, but seems to work, in awk:
BEGIN {
i = 0
}
{
if ($0 == "header") {
write = 1
} else if ($0 == "footer") {
write = 0
i = i 1
} else {
if (write == 1) {
print $0 > "file"i
}
}
}
CodePudding user response:
I would harness GNU AWK
for this task following way, let file.txt
content be
[timestamp1] header with space
[timestamp2] data1
[timestamp3] data2
[timestamp4] data3
[timestamp5] ..
[timestamp6] footer with space
[timestamp7] junk
[timestamp8] header with space
[timestamp9] data4
[timestamp10] data5
[timestamp11] ...
[timestamp12] footer with space
[timestamp13] junk
[timestamp14] header with space
[timestamp15] data6
[timestamp16] data7
[timestamp17] data8
[timestamp18] ..
[timestamp19] footer with space
then
awk '/header/{c =1;p=1;next}/footer/{close("file" c);p=0}p{print $0 > ("file" c)}' file.txt
produces file1
with content
[timestamp1] header with space
[timestamp2] data1
[timestamp3] data2
[timestamp4] data3
[timestamp5] ..
and file2
with content
[timestamp8] header with space
[timestamp9] data4
[timestamp10] data5
[timestamp11] ...
and file3
with content
[timestamp15] data6
[timestamp16] data7
[timestamp17] data8
[timestamp18] ..
Explanation: my code has 3 pattern-action pairs, for line containing header
I increase counter c
by 1 and set flag p
to 1 and go to next
line so no other action is undertaken, for line cotaining footer
I close file named file
followed by current counter number and set flag p
to 0. For lines where p
is set to true I print
current line ($0
) to file named file
followed by current counter number. If required adjust /header/
and /footer/
to contant solely on lines which are header and footer lines.
(tested in GNU Awk 5.0.1)