I have multiple files containing this information:
sP12345.txt
COMMENT Method: conceptual translation.
FEATURES Location/Qualifiers
source 1..3024
/organism="H"
/isolate="sP12345"
/isolation_source="blood"
/host="Homo sapiens"
/db_xref="taxon:11103"
/collection_date="31-Mar-2014"
/note="genotype: 3"
sP4567.txt
COMMENT Method: conceptual translation.
FEATURES Location/Qualifiers
source 1..3024
/organism="H"
/isolate="sP4567"
/isolation_source="blood"
/host="Homo sapiens"
/db_xref="taxon:11103"
/collection_date="31-Mar-2014"
/note="genotype: 2"
Now I would like to get the /note="genotype: 3"
and copy only the number that is after genotype:
copy it to a new textfile and print the filename from which is has been taken as column 2.
Expected Output:
3 sP12345
2 sP4567
I tried this code: but it only prints the first column and not the filename:
awk -F'note="genotype: ' -v OFS='\t' 'FNR==1{ c} NF>1{print $2, c}' *.txt > output_file.txt
CodePudding user response:
You may use:
awk '/\/note="genotype: /{gsub(/^.* |"$/, ""); f=FILENAME; sub(/.[^.] $/, "", f); print $0 "\t" f}' sP*.txt
3 sP12345
2 sP4567
CodePudding user response:
$ awk -v OFS='\t' 'sub(/\/note="genotype:/,""){print $0 0, FILENAME}' sP12345.txt sP4567.txt
3 sP12345.txt
2 sP4567.txt
CodePudding user response:
You can do:
awk '/\/note="genotype:/{split($0,a,": "); print a[2] 0,"\t",FILENAME}' sP*.txt
3 sP12345.txt
2 sP4567.txt
CodePudding user response:
With your shown samples, in GNU awk
please try following awk
code.
awk -v RS='/note="genotype: [0-9]*"' '
RT{
gsub(/.*: |"$/,"",RT)
print RT,FILENAME
nextfile
}
' *.txt
Explanation: Simple explanation would be, passing all .txt
files to GNU awk
program here. Then setting RS
(record separator) as /note="genotype: [0-9]*"
as per shown samples and requirement. In main program of awk
, using gsub
(global substitution) to removing everything till colon followed by space AND "
at the end of value of RT with NULL. Then printing value of RT
followed by current file's name. Using nextfile
will directly take program to next file skipping rest of contents of file, to save sometime for us.