Print first and every nth column using awk-CodePudding

I want to print the 1st column (gene) and all the raw_counts columns in a tab-seperated file.

I've tried:

BEGIN {FS = "\t"}
{for (i = 3; i <= NF; i  = 1) printf ("%s%c", $i, i   1 <= NF ? "\t" : "\n");}

but the output is the same as the input.

awk -f prog.awk < input.csv > output.csv

original header:

gene    raw_counts      median_length_normalized        RPKM    raw_counts      median_length_normalized        RPKM   raw_counts       median_length_normalized        RPKM    raw_counts      median_length_normalized        RPKM   raw_counts

expected output (header):

gene    raw_counts      raw_counts     raw_counts       raw_counts      raw_counts

CodePudding user response：

A few tweaks:

start the loop counter at 2
increment the loop counter by 3 on each pass

Modifying OP's code:

$ awk 'BEGIN {FS=OFS="\t") {printf "%s",$1; for (i=2;i<=NF;i =3) printf "%s%s",OFS,$i}' input.csv    
gene    raw_counts      raw_counts      raw_counts      raw_counts      raw_counts

CodePudding user response：

You can do this:

awk 'BEGIN{FS=OFS="\t"}
FNR==1{
    header[1]
    for(i=2;i<=NF;i  ) if($i=="raw_counts") header[i]
}
{
    for (i=1;i<=NF;i  ) 
        if(i in header) {printf("%s%s", sep, $i); sep=OFS}
    print ""
}' file

First time though, it prints your headers and from then on only the values associated with those headers.