I have a long file that looks like this:
ATOM 55 CE1 LIG X 1 -2.921 4.159 -10.046 1.00 0.00 LIGA
I need to take the first letter of the third column, in this case C (but it changes by line), and replace my last column, LIGA, with this character. I need to do this while ensuring the spacing between my 12th and 13th columns is 11, as shown below. I need it to be identical to the line below for my program to understand it.
ATOM 55 CE1 LIG X 1 -4.950 9.318 4.387 1.00 0.00 C
I managed to copy the first letter of the third column into a different file, then delete the 13th column from the original file, and paste the different file into the original file with the lines below. However, I can find a way to fix the spacing.
cut -c 14 original.pdb > different.pdb
perl -pi -e 's/LIGA//g' original.pdb
paste original.pdb different.pdb >> joint.pdb
mv joint.pdb original.pdb
I know awk may work here. I haven't been able to achieve it. I appreciate the help!
CodePudding user response:
1st solution: With your shown samples and attempts please try following awk
code. Written and tested in GNU awk
.
awk '
match($0,/(^[^[:space:]] [[:space:]] [^[:space:]] [[:space:]] )(.)([^[:space:]]*.*[[:space:]] )/,arr){
print arr[1] arr[2] arr[3] arr[2]
}
' Input_file
2nd solution: Using sed
with its -E
option to enable ERE here.
sed -E 's/(^[^[:space:]] [[:space:]] [^[:space:]] [[:space:]] )(.)([^[:space:]]*.*[[:space:]] ).*/\1\2\3\2/' Input_file
Here is the Online demo for shown regex((^[^[:space:]] [[:space:]] [^[:space:]] [[:space:]] )(.)([^[:space:]]*.*[[:space:]] )
) for understanding purposes(NOTE: regex used in site is bit different(to satisfy site's requirement) use regex shown in code here only).
CodePudding user response:
perl -ape '$lc = substr $F[2],0,1; s/$F[11]/ $lc/' original.pdb
- Use
-a
to autosplit into@F
- Use
-p
to loop,-e
to execute inline program $lc = substr $F[2],0,1
- get first char of 3rd col as variable$lc
s/$F[11]/ $lc/
- replace 12th column with 3 spaces then$lc
This should get you close. I can't exactly follow the column counts and space counts.
But it's just counting on the 12th col being a unique string, that can be replaced with $lc
This also depends on 12th column 'LIGA' always being 4 chars. If that field is variable width, you could always replace all the chars in it with a space, and then replace the last char:
perl -ape '$lc = substr $F[2],0,1; ($new = $F[11]) =~ s/./ /g; $new =~ s/.$/$lc/; s/$F[11]/$new/' original.pdb
... again, $F[11]
must be a unique string, or any other earlier occurrence will be replaced. But depending on that means you keep the char spacing of the original.