Copy the first character of a column and replace an existing column with it, while maintaining the s-CodePudding

I have a long file that looks like this:

ATOM 55 CE1 LIG X 1 -2.921 4.159 -10.046 1.00 0.00 LIGA

I need to take the first letter of the third column, in this case C (but it changes by line), and replace my last column, LIGA, with this character. I need to do this while ensuring the spacing between my 12th and 13th columns is 11, as shown below. I need it to be identical to the line below for my program to understand it.

ATOM 55 CE1 LIG X 1 -4.950 9.318 4.387 1.00 0.00 C

I managed to copy the first letter of the third column into a different file, then delete the 13th column from the original file, and paste the different file into the original file with the lines below. However, I can find a way to fix the spacing.

cut -c 14 original.pdb > different.pdb
perl -pi -e 's/LIGA//g' original.pdb
paste original.pdb different.pdb >> joint.pdb
mv joint.pdb original.pdb

I know awk may work here. I haven't been able to achieve it. I appreciate the help!

CodePudding user response：

1st solution: With your shown samples and attempts please try following awk code. Written and tested in GNU awk.

awk '
match($0,/(^[^[:space:]] [[:space:]] [^[:space:]] [[:space:]] )(.)([^[:space:]]*.*[[:space:]] )/,arr){
  print arr[1] arr[2] arr[3] arr[2]
}
' Input_file

2nd solution: Using sed with its -E option to enable ERE here.

sed -E 's/(^[^[:space:]] [[:space:]] [^[:space:]] [[:space:]] )(.)([^[:space:]]*.*[[:space:]] ).*/\1\2\3\2/'  Input_file

Here is the Online demo for shown regex((^[^[:space:]] [[:space:]] [^[:space:]] [[:space:]] )(.)([^[:space:]]*.*[[:space:]] )) for understanding purposes(NOTE: regex used in site is bit different(to satisfy site's requirement) use regex shown in code here only).

CodePudding user response：

perl -ape '$lc = substr $F[2],0,1; s/$F[11]/   $lc/' original.pdb

Use -a to autosplit into @F
Use -p to loop, -e to execute inline program
$lc = substr $F[2],0,1 - get first char of 3rd col as variable $lc
s/$F[11]/ $lc/ - replace 12th column with 3 spaces then $lc

This should get you close. I can't exactly follow the column counts and space counts.

But it's just counting on the 12th col being a unique string, that can be replaced with $lc

This also depends on 12th column 'LIGA' always being 4 chars. If that field is variable width, you could always replace all the chars in it with a space, and then replace the last char:

perl -ape '$lc = substr $F[2],0,1; ($new = $F[11]) =~ s/./ /g; $new =~ s/.$/$lc/; s/$F[11]/$new/' original.pdb

... again, $F[11] must be a unique string, or any other earlier occurrence will be replaced. But depending on that means you keep the char spacing of the original.