I'm trying to create a new tsv from a tsv. In the first one I have a column with the sample names and several columns with numbers. In the second one I want to have the columns with the percentage of each column divided by the sum of all columns.
I have done this, which kind of works, but then it is not tab separated.
cat multiqc_data/mqc_featurecounts_biotype_plot_1.txt | awk -v OFS='\t' -F'\t' 'NR>1{sum=0; for(i=1; i<=NF; i ) sum = $i; NF ; $NF=sum } 1' | tail -n 2 | awk '{for(i=2;i<=NF;i )$i/=$42}1' | rev | cut -d" " -f2- | rev
Then, after, I tried to use column -t -s " "
to transform it to tsv, but it didn't work, I don't know why.
Could you help me with this? If there is a different code that works I will be glad to try it. Thanks!
CodePudding user response:
I'm trying to create a new tsv from a tsv. In the first one I have a column with the sample names and several columns with numbers. In the second one I want to have the columns with the percentage of each column divided by the sum of all columns.
I would do it single GNU AWK
call as follows, let file.tsv
content be
A 10 10 10
B 10
C 10 20 30 40 50
then
awk 'BEGIN{FS=OFS="\t"}{t=0;for(i=2;i<=NF;i =1){t =$i};for(i=2;i<=NF;i =1){$i/=t};print}' file.tsv
gives output
A 0.333333 0.333333 0.333333
B 1
C 0.0666667 0.133333 0.2 0.266667 0.333333
Explanation: I inform GNU AWK
that tab character is both field separator (FS
) and output field separator (OFS
). For each line I do set value of t
to zero, then sum columns starting at 2nd, then divide each column starting at 2nd by sum, print
all columns. Note that GNU AWK
is not limited to standard input and can consume files without need of cat
.
(tested in gawk 4.2.1)