I want to print the 1st column (gene
) and all the raw_counts
columns in a tab-seperated file.
I've tried:
BEGIN {FS = "\t"}
{for (i = 3; i <= NF; i = 1) printf ("%s%c", $i, i 1 <= NF ? "\t" : "\n");}
but the output is the same as the input.
awk -f prog.awk < input.csv > output.csv
original header:
gene raw_counts median_length_normalized RPKM raw_counts median_length_normalized RPKM raw_counts median_length_normalized RPKM raw_counts median_length_normalized RPKM raw_counts
expected output (header):
gene raw_counts raw_counts raw_counts raw_counts raw_counts
CodePudding user response:
A few tweaks:
- start the loop counter at
2
- increment the loop counter by
3
on each pass
Modifying OP's code:
$ awk 'BEGIN {FS=OFS="\t") {printf "%s",$1; for (i=2;i<=NF;i =3) printf "%s%s",OFS,$i}' input.csv
gene raw_counts raw_counts raw_counts raw_counts raw_counts
CodePudding user response:
You can do this:
awk 'BEGIN{FS=OFS="\t"}
FNR==1{
header[1]
for(i=2;i<=NF;i ) if($i=="raw_counts") header[i]
}
{
for (i=1;i<=NF;i )
if(i in header) {printf("%s%s", sep, $i); sep=OFS}
print ""
}' file
First time though, it prints your headers and from then on only the values associated with those headers.