Home > other >  split the file based on header and footer lines
split the file based on header and footer lines

Time:12-04

I have a text file structured like this:

[timestamp1] header with space
[timestamp2] data1 
[timestamp3] data2
[timestamp4] data3
[timestamp5] ..
[timestamp6] footer with space
[timestamp7] junk
[timestamp8] header with space
[timestamp9] data4
[timestamp10] data5
[timestamp11] ...
[timestamp12] footer with space
[timestamp13] junk
[timestamp14] header with space
[timestamp15] data6
[timestamp16] data7
[timestamp17] data8
[timestamp18] ..
[timestamp19] footer with space

I need to find each part between header and footer and save it in another file. For example the file1 should contain (with or without timestamps; doesn't matter):

data1
data2
data3
..

and the next pack should be saved as file2 and so on. This seems like a routine process, but I haven't find a solution yet.

I have this sed command that finds the first packet.

sed -n "/header/,/footer/{p;/footer/q}" file

But I don't know how to iterate that over the next matches. Maybe I should delete the first match after copying it to another file and repeat the same command

CodePudding user response:

A very naive approach, coded fast, could be improved, but seems to work, in awk:

BEGIN {
    i = 0
}
{
    if ($0 == "header") {
        write = 1
    } else if ($0 == "footer") {
        write = 0
        i = i   1
    } else {
        if (write == 1) {
            print $0 > "file"i
        }
    }
}

CodePudding user response:

I would harness GNU AWK for this task following way, let file.txt content be

[timestamp1] header with space
[timestamp2] data1 
[timestamp3] data2
[timestamp4] data3
[timestamp5] ..
[timestamp6] footer with space
[timestamp7] junk
[timestamp8] header with space
[timestamp9] data4
[timestamp10] data5
[timestamp11] ...
[timestamp12] footer with space
[timestamp13] junk
[timestamp14] header with space
[timestamp15] data6
[timestamp16] data7
[timestamp17] data8
[timestamp18] ..
[timestamp19] footer with space

then

awk '/header/{c =1;p=1;next}/footer/{close("file" c);p=0}p{print $0 > ("file" c)}' file.txt

produces file1 with content

[timestamp1] header with space
[timestamp2] data1 
[timestamp3] data2
[timestamp4] data3
[timestamp5] ..

and file2 with content

[timestamp8] header with space
[timestamp9] data4
[timestamp10] data5
[timestamp11] ...

and file3 with content

[timestamp15] data6
[timestamp16] data7
[timestamp17] data8
[timestamp18] ..

Explanation: my code has 3 pattern-action pairs, for line containing header I increase counter c by 1 and set flag p to 1 and go to next line so no other action is undertaken, for line cotaining footer I close file named file followed by current counter number and set flag p to 0. For lines where p is set to true I print current line ($0) to file named file followed by current counter number. If required adjust /header/ and /footer/ to contant solely on lines which are header and footer lines.

(tested in GNU Awk 5.0.1)

  • Related