I have been trying to find the amount of 1s per each species in a fasta file that looks like this:
>111
1100101010
>102
1110000001
The desired output would be:
>111
5
>102
4
I know how to get the numbers of 1s in a file with:
grep -c 1 file
My problem is that I cannot find the way to keep track of the number of 1s per each species (instead of the total in the file).
CodePudding user response:
Assuming your fasta is formatted as you indicate, and assuming using awk
would be acceptable, then the following might work:
while read -r one ; do
echo "${one}"
read -r two
awk -F"1" '{print NF-1}' <<< "${two}"
done <fasta.txt
(Note: The awk command is splitting the string by '1' and then printing the number of resulting fields minus 1)
fasta.txt:
>111
1100101010
>102
1110000001
Output:
>111
5
>102
4
CodePudding user response:
grep -c 1
will give you the number of matching lines, not the total number of 1
s. You could use grep -o
to make it print only the matching parts of each matching line on a separate line each and then wc -l
to count the number of lines.
while read -r species
do
echo "$species"
read -r seq
echo -n "$seq" | grep -o 1 | wc -l
done < fasta_file