There is a .bed file. It has 4 columns. First contains the number of the chromosome. I need to write a bash script, to get every row which belongs to a specific chromosome, then in those cases subtract the second column from the third column (this gives the length of the gene), then I need to calculate the average length of those genes (which is on the same chromosome). And i have to do this on every chromosomes.
This code calculates the average length of the whole table, but i need to do this separately on every chromosome.
`#!/bin/bash
input_bed=${1}
awk 'BEGIN {
FS="\t"
sum=0
}
{
sum =$3-$2
} END {
print sum / NR;
}' ${input_bed}
#Exiting
exit`
CodePudding user response:
You can put a predicate before the line processing block, it will then only run on input lines that satisfy the condition. Swap "1" for whatever chromosome you are investigating.
input_bed=${1}
awk 'BEGIN {
FS="\t"
sum=0
}
$1 = "1"
{
sum =$3-$2
} END {
print sum / NR;
}' ${input_bed}
#Exiting
exit
Alternatively, you can do it all in one run by saving the results to an associative array.
input_bed=${1}
awk 'BEGIN {
FS="\t"
}
{
sum[$1] =$3-$2
cnt[$1] =1
} END {
for (chromosome in cnt) {
print "Avg of Chromosome ", chromosome, " is" sum[chromosome] / cnt[chromosome];
}
}' ${input_bed}
#Exiting
exit