Home > Software design >  Create tsv with percentages for every column bash
Create tsv with percentages for every column bash

Time:07-26

I'm trying to create a new tsv from a tsv. In the first one I have a column with the sample names and several columns with numbers. In the second one I want to have the columns with the percentage of each column divided by the sum of all columns.

I have done this, which kind of works, but then it is not tab separated. cat multiqc_data/mqc_featurecounts_biotype_plot_1.txt | awk -v OFS='\t' -F'\t' 'NR>1{sum=0; for(i=1; i<=NF; i ) sum = $i; NF ; $NF=sum } 1' | tail -n 2 | awk '{for(i=2;i<=NF;i )$i/=$42}1' | rev | cut -d" " -f2- | rev

Then, after, I tried to use column -t -s " " to transform it to tsv, but it didn't work, I don't know why.

Could you help me with this? If there is a different code that works I will be glad to try it. Thanks!

CodePudding user response:

I'm trying to create a new tsv from a tsv. In the first one I have a column with the sample names and several columns with numbers. In the second one I want to have the columns with the percentage of each column divided by the sum of all columns.

I would do it single GNU AWK call as follows, let file.tsv content be

A   10  10  10
B   10
C   10  20  30  40  50

then

awk 'BEGIN{FS=OFS="\t"}{t=0;for(i=2;i<=NF;i =1){t =$i};for(i=2;i<=NF;i =1){$i/=t};print}' file.tsv

gives output

A   0.333333    0.333333    0.333333
B   1
C   0.0666667   0.133333    0.2 0.266667    0.333333

Explanation: I inform GNU AWK that tab character is both field separator (FS) and output field separator (OFS). For each line I do set value of t to zero, then sum columns starting at 2nd, then divide each column starting at 2nd by sum, print all columns. Note that GNU AWK is not limited to standard input and can consume files without need of cat.

(tested in gawk 4.2.1)

  • Related