I am working with the log filles arranged in the following format:
fƒdfFinding intramodel H-bonds
Constraints relaxed by 0.5 angstroms and 20 degrees
Models used:
1.1 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.2 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.3 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.4 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.5 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.6 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.7 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.8 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.9 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.10 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.11 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.12 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.13 SarsCov2_structure49R_nsp5holo_rep1.pdb
1.14 SarsCov2_structure49R_nsp5holo_rep1.pdb
14 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.1/? ASN 142 ND2 SarsCov2_structure49R_nsp5holo_rep1.pdb #1.1/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.1/? ASN 142 1HD2 3.102 2.145
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.3/? GLU 166 N SarsCov2_structure49R_nsp5holo_rep1.pdb #1.3/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.3/? GLU 166 H 3.011 2.024
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.4/? GLU 166 N SarsCov2_structure49R_nsp5holo_rep1.pdb #1.4/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.4/? GLU 166 H 3.037 2.132
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.5/? HIS 163 NE2 SarsCov2_structure49R_nsp5holo_rep1.pdb #1.5/A UNL 888 O no hydrogen 3.388 N/A
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.5/? GLU 166 N SarsCov2_structure49R_nsp5holo_rep1.pdb #1.5/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.5/? GLU 166 H 2.806 1.792
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/? THR 26 N SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/? THR 26 H 3.093 2.142
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/? GLY 143 N SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/? GLY 143 H 3.030 2.193
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.9/? GLN 189 NE2 SarsCov2_structure49R_nsp5holo_rep1.pdb #1.9/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.9/? GLN 189 2HE2 3.052 2.301
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.10/? GLU 166 N SarsCov2_structure49R_nsp5holo_rep1.pdb #1.10/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.10/? GLU 166 H 2.854 1.868
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.12/? GLY 143 N SarsCov2_structure49R_nsp5holo_rep1.pdb #1.12/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.12/? GLY 143 H 3.103 2.070
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/? GLY 143 N SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/? GLY 143 H 3.161 2.224
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/? CYS 145 SG SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/? CYS 145 HG 3.421 2.842
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/? ASN 142 ND2 SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/? ASN 142 2HD2 3.055 2.465
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/? CYS 145 N SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/A UNL 888 O SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/? CYS 145 H 2.924 2.143
I need to find the first occurence of the "GLU 166 N" pattern and print the number present on the same line just before the pattern as #1.number/?, associated with this pattern. So in the example the detected number should be 3 (since the associating number is #1.3/?).
I would start from basic pattern-detection
awk '/GLU 166 N/' file
but how to find correctly the number defined just before the pattern and print it as output ? Finally, in the case if the pattern can not be found, I would like that the script prints 1.
CodePudding user response:
$ awk -vn=1 '/GLU 166 N/ {gsub(/.*\.|\/\?/,"",$2); n=$2; exit} END {print n}' file
3
$ awk -vn=1 '/GLU 166 N/ {gsub(/.*\.|\/\?/,"",$2); n=$2; exit} END {print n}' /dev/null
1
What you look for is in the second field ($2
). gsub(/.*\.|\/\?/,"",$2)
replaces in $2
all leading characters up to (and including) the period, and the trailing /?
by the empty string.
CodePudding user response:
Using GNU awk for the 3rd arg to match()
:
$ awk 'match($0,/([0-9] ).. GLU 166 N /,a){print a[1]; exit}' file
3
or using any awk:
$ awk 'match($0,/[0-9] .. GLU 166 N /){sub("/.*",""); print substr($0,RSTART); exit}' file
3
$ awk 'match($0,/[0-9] .. GLU 166 N /){print substr($0,RSTART,RLENGTH-13); exit}' file
3
CodePudding user response:
If GNU awk
which supports gensub
function is available, would you please try:
awk '/GLU 166 N/ {
print gensub(/^.*#1\.([0-9] )\/\? GLU 166 N.*$/, "\\1", 1)
exit
}' file
The regex ^.*#1\\.([0-9] )/\\? GLU 166 N.*$
matches the line with the substring #1.<number>/? "GLU 166 N
. The <number>
portion, which is enclosed with the parentheses in the regex as ([0-9] )
is captured as group 1, then the entire line is replaced with the group 1, which is specified as the replacement \\1
, then it is printed as the result.
Alternatively you can say with GNU sed
as:
sed -nE '0,/GLU 166 N/s|^.*#1\.([0-9] )/\? GLU 166 N.*|\1|p' file
The address 0,/pattern/
, where 0 is specific to GNU sed
as a starting line, makes the script exit immediately after the 1st pattern match.