Home > Enterprise >  bash awk: extract specific information from ensemble of filles
bash awk: extract specific information from ensemble of filles

Time:12-10

I am using bash script to extract some information from log files located within the directory and save the summary in the separate file. In the bottom of each log file, there is a table like:

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
----- ------------ ---------- ----------
   1       -6.961          0          0
   2       -6.797      2.908      4.673
   3       -6.639      27.93      30.19
   4       -6.204      2.949      6.422
   5       -6.111      24.92      28.55
   6       -6.058      2.836      7.608
   7       -5.986      6.448      10.53
   8        -5.95      19.32      23.99
   9       -5.927      27.63      30.04
  10       -5.916      27.17      31.29
  11       -5.895      25.88      30.23
  12       -5.835      26.24      30.36

from this I need to take only the value from the second column of the first line (-6.961) and add it together with the name of the log as one string in new ranking_${output}.log

log_name -6.961

so for 5 processed logs it should be something like:

# ranking_${output}.log
log_name1 -X.XXX
log_name2 -X.XXX
log_name3 -X.XXX
log_name4 -X.XXX
log_name5 -X.XXX

Here is a simple bash workflow, which takes ALL THE LINES from ranking table and saves it together with the name of the LOG file:

#!/bin/bash
home="$PWD"
#folder contained all *.log files
results="${home}"/results

# loop each log file and take its name   all the ranking table
 for log in ${results}/*.log; do
  log_name=$(basename "$log" .log)
  echo "$log_name" >> ${results}/ranking_${output}.log
  cat $log | tail -n 12 >> ${results}/ranking_${output}.log
done

Could you suggest me an AWK routine which would select only the top value located on the first line of each table? This is an AWK example that I had used for another format, which does not work there:

awk -F', *' 'FNR==2 {f=FILENAME; 
                     sub(/.*\//,"",f);
                     sub(/_.*/ ,"",f);
                     printf("%s: %s\n", f, $5) }' ${results}/*.log >> ${results}/ranking_${output}.log

CodePudding user response:

With awk. If first column contains 1 print filename and second column to file output:

awk '$1=="1"{print FILENAME, $2}' *.log > output

Update to remove path and suffix (.log):

awk '$1=="1"{sub(/.*\//,"",FILENAME); sub(/\.log/,"",FILENAME); print FILENAME, $2}'
  • Related