I have a large file coord
that is as follows:
$coord
-6.81387808414325 5.82189470091282 -1.45477353169903 c f
3.12250219010826 1.39239934150351 0.78451413146001 o f
-4.76572488013335 -1.67551810949494 -1.58797087759328 c f
-0.15061495158492 -2.18614667480844 -2.60227003662941 c f
etc...
9.21060449992324 -2.77968508411378 0.71587738888748 h f
5.87109372056745 -2.67040600177892 0.54514819243204 h f
7.70747476642116 -1.85827163328137 -2.12317155170529 h f
3.16053583847830 1.75657003778612 4.21784993053015 h
3.20523873898751 2.06642906155866 6.03962166222879 o
3.84518636016769 0.52341324778083 6.76769535558585 h
$intdef
# definitions of internal coordinates
1 f 1.0000000000000 stre 6 21 val= 2.05908
2 f 1.0000000000000 stre 6 53 val= 2.07110
3 f 0.0463401612403 bend 53 21 6 val= 1.20720
0.5016372600998 bend 7 21 6
0.4983829790270 bend 7 53 6
etc...
There are keywords that start with $
such as $coord
and $intdef
. What I want to do is to delete the f
from each line after $coord
. So the output should be:
$coord
-6.81387808414325 5.82189470091282 -1.45477353169903 c
3.12250219010826 1.39239934150351 0.78451413146001 o
-4.76572488013335 -1.67551810949494 -1.58797087759328 c
-0.15061495158492 -2.18614667480844 -2.60227003662941 c
etc...
9.21060449992324 -2.77968508411378 0.71587738888748 h
5.87109372056745 -2.67040600177892 0.54514819243204 h
7.70747476642116 -1.85827163328137 -2.12317155170529 h
3.16053583847830 1.75657003778612 4.21784993053015 h
3.20523873898751 2.06642906155866 6.03962166222879 o
3.84518636016769 0.52341324778083 6.76769535558585 h
$intdef
# definitions of internal coordinates
1 f 1.0000000000000 stre 6 21 val= 2.05908
2 f 1.0000000000000 stre 6 53 val= 2.07110
3 f 0.0463401612403 bend 53 21 6 val= 1.20720
0.5016372600998 bend 7 21 6
0.4983829790270 bend 7 53 6
etc...
The f
s should only be deleted after the keyword $coord
but not after any other keyword. Also nothing else should be removed. Just the f
s. So I can just find the keyword $coord
and stop deleting the f
s after the next keyword. I tried to do this in bash. I figured out that I can delete the last column with awk
:
awk 'NF{NF=-1};1'
And I can find the lines where f
is with sed
:
sed -n '/$coord/,/\$/{/$coord/!{/\$/!p}}'
I tried to make a script but I am not able to figure out how I could use these to get the correct output. Or is there some easier way to do this? Can anyone help?
CodePudding user response:
You can use awk
for that:
awk -v RS='$' -v ORS='$' '/^coord\n/ {gsub(/ f\n/,"\n")} 1' file
CodePudding user response:
This might work for you (GNU sed):
sed '/^$./h;G;/^$coord/Ms/f//g;P;d' file
Store a keyword in the hold space.
Append the hold space to each line and if the hold space begins with $coord
remove any f
's from that line.
Print the first line in the pattern space and then delete everything.
Thus when a keyword appears, each line thereafter will belong to that keyword until the keyword changes.
The P
command allows for the current line to printed as is and not the introduced artifact.
CodePudding user response:
In sed
, you can easily restrict a command to a range of lines. The only slight difficulty here is that the typical command will make the replacement on the matching lines, and the easiest way here to select the range of lines is with the two headers where the replacement should not happen. So you can either restrict the line range to exclude the headers, or you can extend the string to be replaced from merely f
at the end of the line to f
at the end of the line. You probably want to delete the space anyway, and maybe your headers can contain spaces, so it's not clear which of the following solutions is appropriate:
sed '/\$coord/,/^[$]/{/^[^$]/s/f$//;}'
sed '/\$coord/,/^[$]/s/ f$//'
The second solution is slightly simpler, but less robust since it will fail if either of the keywords ends with f
. If your keywords cannot contain whitespace, this may not be a concern. The first solution only removes the f
from the end of the line, and the leading space will be retained. That is consistent with the description you've given, but perhaps not the behavior you actually want.
Either solution is doing basically the same thing; apply the command s/f$//
on a restricted range of lines. That command searches for the pattern f$
(or f$
) which matches an f
at the end of the line. In each command, the original address range is specified as the lines between (and including) the line which matches $coord
(the backslash causes sed to match a literal $
rather than matching the end of line) and the next line that starts with a literal $
. The first solution includes a second address range that prevents the command from being applied on a line matching the keyword.
CodePudding user response:
Using any awk:
$ awk '/^\$/{ key=$0 } key=="$coord"{ sub(/ f$/,"") } 1' file
$coord
-6.81387808414325 5.82189470091282 -1.45477353169903 c
3.12250219010826 1.39239934150351 0.78451413146001 o
-4.76572488013335 -1.67551810949494 -1.58797087759328 c
-0.15061495158492 -2.18614667480844 -2.60227003662941 c
etc...
9.21060449992324 -2.77968508411378 0.71587738888748 h
5.87109372056745 -2.67040600177892 0.54514819243204 h
7.70747476642116 -1.85827163328137 -2.12317155170529 h
3.16053583847830 1.75657003778612 4.21784993053015 h
3.20523873898751 2.06642906155866 6.03962166222879 o
3.84518636016769 0.52341324778083 6.76769535558585 h
$intdef
# definitions of internal coordinates
1 f 1.0000000000000 stre 6 21 val= 2.05908
2 f 1.0000000000000 stre 6 53 val= 2.07110
3 f 0.0463401612403 bend 53 21 6 val= 1.20720
0.5016372600998 bend 7 21 6
0.4983829790270 bend 7 53 6
etc...
By the way, in your question you said:
I figured out that I can delete the last column with awk:
awk 'NF{NF=-1};1'
but setting NF to a negative number is a semantic error so I suppose you probably meant:
awk 'NF{NF-=1};1'
but the effect of decrementing NF is undefined behavior per POSIX and even if it does delete the final field (as it will in some awks but not in others), that will trash the spacing of your input by converting all chains of spaces to single blanks. So, I wouldn't do that.