I have a list like this:
#chrom start end seq
#chrom start end seq
#chrom start end seq
chr1 214435102 214435132 AAACCGGTCAGCTT...
chr1 214435135 214435165 TCAATGGACTGTTC...
#chrom start end seq
chr1 214873901 214873931 CCAAATCCCTCAGG...
As it is seen some of them have results (3rd and 4th) and some of them do not (1st and 2nd)
What I am trying to do is first read the line that starts with '#chrom' and read the line after that line. If the next line also starts with '#chrom' print 0, if it starts with something else print 1. And do it for every line that starts with '#chrom' without passing any. I am kind of trying to label the ones that have sequences. I am guessing that there would be an easier way of doing it but what I could create up until now is two lines of code;
awk '/#chrom/{getline; print}' raw.txt > nextLine.txt
awk '$1 == "#chrom" { print "0" } $1 != "#chrom" { print "1" }' nextLine.txt > labeled.txt
Expected output in the labeled.txt;
0
0
1
1
I guess the second line of the code works well. However, the line counts that include '#chrom' in the raw.txt and nextLine.txt are not matching. If you could help me with that I would appreciate it.
Thank you
CodePudding user response:
This should do it:
awk 'BEGIN { chrom=0 } {
if ($1=="#chrom") {
if (chrom==1) print 0; else chrom=1; }
else {
if (chrom==1) print 1; chrom=0 }
}'
CodePudding user response:
One awk
idea:
awk '
{ if (prev=="#chrom") # for 1st line of input prev==""
print ($1 == "#chrom" ? 0 : 1) # use ternary operator to determine output
prev=$1
}
' raw.txt
or as a one-liner:
awk '{if (prev=="#chrom") print ($1 == "#chrom" ? 0 : 1); prev=$1}' raw.txt
This generates:
0
0
1
1
CodePudding user response:
As in life, in software its much easier to do things based on what HAS happened than on what WILL happen. So don't write requirements based on what the NEXT line of input will be, write them based on what the PREVIOUS line of input was and you'll find it much easier to figure out the matching code and that code will be simpler than trying to determine the next line of input.
$ cat tst.awk
($1 == "#chrom") && (NR > 1) {
print ( prev == "#chrom" ? 0 : 1 )
}
{ prev = $1 }
END {
print ( prev == "#chrom" ? 0 : 1 )
}
$ awk -f tst.awk file
0
0
1
1