Home > Back-end >  Bash iterate through fields of a TSV file and divide it by the sum of the column
Bash iterate through fields of a TSV file and divide it by the sum of the column

Time:07-12

I have a tsv file with several columns, and I would like to iterate through each field, and divide it by the sum of that column:

Input:

A    1    2    1
B    1    0    3

Output:

A    0.5    1    0.25
B    0.5    0    0.75

I have the following to iterate through the fields, but I am not sure how I can find the sum of the column that the field is located in:

awk -v FS='\t' -v OFS='\t' '{for(i=2;i<=NF;i  ){$i=$i/SUM_OF_COLUMN}} 1' input.tsv

CodePudding user response:

You may use this 2-pass awk:

awk '
BEGIN {FS=OFS="\t"}
NR == FNR {
   for (i=2; i<=NF;   i)
      sum[i]  = $i
   next
}
{
   for (i=2; i<=NF;   i)
      $i = (sum[i] ? $i/sum[i] : 0)
}
1' file file

A       0.5     1       0.25
B       0.5     0       0.75

CodePudding user response:

With your shown samples please try following awk code in a single pass of Input_file. Simply creating 2 arrays 1 for sum of columns with their indexes and other for values of fields along with their field numbers and in END block of this program traversing till value of FNR(all lines) and then printing values of arrays as per need (where when we are traversing through values then dividing their actual values with sum of that respective column).

awk '
BEGIN{FS=OFS="\t"} 
{
  arr[FNR,1]=$1
  for(i=2;i<=NF;i  ){
    sum[i] =$i
    arr[FNR,i]=$i
  }
}
END{
  for(i=1;i<=FNR;i  ){
    printf("%s\t",arr[i,1])
    for(j=2;j<=NF;j  ){
      printf("%s%s",sum[j]?(arr[i,j]/sum[j]):"N/A",j==NF?ORS:OFS)
    }
  }
}
'  Input_file
  • Related