Home > other >  How to concatenate every four lines for the first four lines of a file
How to concatenate every four lines for the first four lines of a file

Time:12-17

I'm not sure the best wording to ask this question, but I'm trying to concatenate the first four lines to the next four lines and so on until the end of the file.

My data looks like:

aggaacgtgagttgaaaattgaagcgacaaacttggtttcatgtcctgtttgtggaaaga
catctattgttagagacaatatattgtctgatctgacttatctgcatgttc---------
 .     **    ..* * *. * .* * .*..**..**  .  * ****.         

gcataaaaggaatggacacaatcataaatgaacatcttgatatctgccttacaagaaggt
----------tgtggattcctttctttttccttttggagatatctgccttacaagaaggt
           .****. *  *. *   *   . *   **********************

ccaaacgaaaacttacccaacgcacactacttcagtttggtgttggatcaagtaccaaaa
ccaaacgaaaacttacccaacgcacactacttcagtttggtgttggatcaagtaccaaaa
************************************************************

And I'm trying to merge/concatenate every four lines to the four lines before to create a horizontal file format that looks like:

aggaacgtgagttgaaaattgaagcgacaaacttggtttcatgtcctgtttgtggaaagagcataaaaggaatggacacaatcataaatgaacatcttgatatctgccttacaagaaggtccaaacgaaaacttacccaacgcacactacttcagtttggtgttggatcaagtaccaaaa
catctattgttagagacaatatattgtctgatctgacttatctgcatgttc-------------------tgtggattcctttctttttccttttggagatatctgccttacaagaaggtccaaacgaaaacttacccaacgcacactacttcagtttggtgttggatcaagtaccaaaa
 .     **    ..* * *. * .* * .*..**..**  .  * ****.                    .****. *  *. *   *   . *   **********************************************************************************

I know I can use paste - - to delete a newline character every other line, but what would be the simplest route to paste together the different lines of my file for the first four lines every other four lines?

CodePudding user response:

You could use :

#!/bin/perl

use strict;
use warnings;

my %lines;                   # hash container to store the lines

while(<>) {                  # read lines from stdin
    chomp;                   # remove newline
    my $idx = ($. - 1) % 4;  # calculate index of line [0,4)
    $lines{$idx} .= $_;      # concatename the current line to what's at $idx
}

# Done, print the result:
for(my $i = 0; $i < 4;   $i) {
    print $lines{$i} ."\n";
}

CodePudding user response:

It seems every three lines are separated by a blank line. If this is the case, this awk program might be what you're looking for:

awk 'BEGIN { RS = ""; FS = "\n" }
           { for (i = 1; i <= 3;   i) line[i] = line[i] $i }
     END   { for (i = 1; i <= 3;   i) print line[i] }
' file
  • Related