Take string from multiple files and copy to new file and print filename into second column in bash-CodePudding

I have multiple files containing this information:

sP12345.txt

COMMENT     Method: conceptual translation.
FEATURES             Location/Qualifiers
     source          1..3024
                     /organism="H"
                     /isolate="sP12345"
                     /isolation_source="blood"
                     /host="Homo sapiens"
                     /db_xref="taxon:11103"
                     /collection_date="31-Mar-2014"
                     /note="genotype: 3"

sP4567.txt

COMMENT     Method: conceptual translation.
FEATURES             Location/Qualifiers
     source          1..3024
                     /organism="H"
                     /isolate="sP4567"
                     /isolation_source="blood"
                     /host="Homo sapiens"
                     /db_xref="taxon:11103"
                     /collection_date="31-Mar-2014"
                     /note="genotype: 2"

Now I would like to get the /note="genotype: 3" and copy only the number that is after genotype: copy it to a new textfile and print the filename from which is has been taken as column 2.

Expected Output:

3  sP12345
2  sP4567

I tried this code: but it only prints the first column and not the filename:

awk -F'note="genotype: ' -v OFS='\t' 'FNR==1{  c} NF>1{print $2, c}' *.txt > output_file.txt

CodePudding user response：

You may use:

awk '/\/note="genotype: /{gsub(/^.* |"$/, ""); f=FILENAME; sub(/.[^.] $/, "", f); print $0 "\t" f}' sP*.txt

3   sP12345
2   sP4567

CodePudding user response：

$ awk -v OFS='\t' 'sub(/\/note="genotype:/,""){print $0 0, FILENAME}' sP12345.txt sP4567.txt
3       sP12345.txt
2       sP4567.txt

CodePudding user response：

You can do:

awk '/\/note="genotype:/{split($0,a,": "); print a[2] 0,"\t",FILENAME}' sP*.txt 
3    sP12345.txt
2    sP4567.txt

CodePudding user response：

With your shown samples, in GNU awk please try following awk code.

awk -v RS='/note="genotype: [0-9]*"' '
RT{
  gsub(/.*: |"$/,"",RT)
  print RT,FILENAME
  nextfile
}
' *.txt

Explanation: Simple explanation would be, passing all .txt files to GNU awk program here. Then setting RS(record separator) as /note="genotype: [0-9]*" as per shown samples and requirement. In main program of awk, using gsub(global substitution) to removing everything till colon followed by space AND " at the end of value of RT with NULL. Then printing value of RT followed by current file's name. Using nextfile will directly take program to next file skipping rest of contents of file, to save sometime for us.