Home > OS >  How to remove empty cells in a row and combine every other row in tab delimited file
How to remove empty cells in a row and combine every other row in tab delimited file

Time:09-29

I have Table 1 which has thousands of rows that looks like this:

chr1    4399801 4400245 peak_12659  719  .   32.37675    -1  1.92924 222
        1       444                         
chr1    2495548 2495992 peak_11970  542  .   36.95443    -1  2.58372 222
        1       444                         
chr1    3572002 3572264 peak_901    1000 .   148.62292   -1  3.94096 145
        1       262          

I want to remove the empty cells that appear in every other row (under chr1 in each row above), then combine these to each preceding row so the final table appears like this:

Table 2:

chr1    4399801 4400245 peak_12659  719     .   32.37675    -1  1.92924 222  1  444
chr1    2495548 2495992 peak_11970  542     .   36.95443    -1  2.58372 222  1  444
chr1    3572002 3572264 peak_901    1000    .   148.62292   -1  3.94096 145  1  262

How can I accomplish this?

Edit- In response to @Cyrus: I found it very difficult to find answers to this question. But I stumbled upon this thread (not exactly what I'm trying to accomplish) and tried the following:

awk '{printf "%s%s",$0,(NR%2?FS:RS)}' Table1.txt > Table2.txt

This command did not merge the alternating rows correctly and in some instances, combined cells instead: Screenshot here

I also tried:

xargs -n2 < Table1.txt > Table2.txt

Each row in the output contains two merged cells: Screenshot 2

CodePudding user response:

With GNU sed. Replace every \n\t with \t:

sed -z 's/\n\t/\t/g' file

Output:

chr1    4399801 4400245 peak_12659      719     .       32.37675        -1      1.92924 222     1       444
chr1    2495548 2495992 peak_11970      542     .       36.95443        -1      2.58372 222     1       444
chr1    3572002 3572264 peak_901        1000    .       148.62292       -1      3.94096 145     1       262

CodePudding user response:

It's not clear (to me) what the input and output field delimiters are supposed to be so I'm just going to strip out all white space and use an output field delimiter of tab (\t):

awk '
BEGIN    { OFS="\t" }
         { $1=$1 }                    # remove excessive white spaces
FNR%2==1 { a=$0 }                     # save odd numbered line
FNR%2==0 { print a,$0 }               # print odd numbered line plus current (even numbered) line
' file

This generates:

chr1    4399801 4400245 peak_12659      719     .       32.37675        -1      1.92924 222     1       444
chr1    2495548 2495992 peak_11970      542     .       36.95443        -1      2.58372 222     1       444
chr1    3572002 3572264 peak_901        1000    .       148.62292       -1      3.94096 145     1       262
  • Related