I have a tsv file with several columns, and I would like to iterate through each field, and divide it by the sum of that column:
Input:
A 1 2 1
B 1 0 3
Output:
A 0.5 1 0.25
B 0.5 0 0.75
I have the following to iterate through the fields, but I am not sure how I can find the sum of the column that the field is located in:
awk -v FS='\t' -v OFS='\t' '{for(i=2;i<=NF;i ){$i=$i/SUM_OF_COLUMN}} 1' input.tsv
CodePudding user response:
You may use this 2-pass awk
:
awk '
BEGIN {FS=OFS="\t"}
NR == FNR {
for (i=2; i<=NF; i)
sum[i] = $i
next
}
{
for (i=2; i<=NF; i)
$i = (sum[i] ? $i/sum[i] : 0)
}
1' file file
A 0.5 1 0.25
B 0.5 0 0.75
CodePudding user response:
With your shown samples please try following awk
code in a single pass of Input_file. Simply creating 2 arrays 1 for sum of columns with their indexes and other for values of fields along with their field numbers and in END
block of this program traversing till value of FNR
(all lines) and then printing values of arrays as per need (where when we are traversing through values then dividing their actual values with sum of that respective column).
awk '
BEGIN{FS=OFS="\t"}
{
arr[FNR,1]=$1
for(i=2;i<=NF;i ){
sum[i] =$i
arr[FNR,i]=$i
}
}
END{
for(i=1;i<=FNR;i ){
printf("%s\t",arr[i,1])
for(j=2;j<=NF;j ){
printf("%s%s",sum[j]?(arr[i,j]/sum[j]):"N/A",j==NF?ORS:OFS)
}
}
}
' Input_file