Home > OS >  Bash: Delete last character from line if certain condition is fulfilled and if the line is between m
Bash: Delete last character from line if certain condition is fulfilled and if the line is between m

Time:06-07

I have a large file coord that is as follows:

$coord
   -6.81387808414325      5.82189470091282     -1.45477353169903  c f
    3.12250219010826      1.39239934150351      0.78451413146001  o f
   -4.76572488013335     -1.67551810949494     -1.58797087759328  c f
   -0.15061495158492     -2.18614667480844     -2.60227003662941  c f
    etc...
    9.21060449992324     -2.77968508411378      0.71587738888748  h f
    5.87109372056745     -2.67040600177892      0.54514819243204  h f
    7.70747476642116     -1.85827163328137     -2.12317155170529  h f
    3.16053583847830      1.75657003778612      4.21784993053015  h
    3.20523873898751      2.06642906155866      6.03962166222879  o
    3.84518636016769      0.52341324778083      6.76769535558585  h
$intdef
# definitions of internal coordinates
   1 f  1.0000000000000 stre    6   21           val=   2.05908
   2 f  1.0000000000000 stre    6   53           val=   2.07110
   3 f  0.0463401612403 bend   53   21    6      val=   1.20720
        0.5016372600998 bend    7   21    6
        0.4983829790270 bend    7   53    6
  etc...

There are keywords that start with $ such as $coord and $intdef. What I want to do is to delete the f from each line after $coord. So the output should be:

$coord
   -6.81387808414325      5.82189470091282     -1.45477353169903  c
    3.12250219010826      1.39239934150351      0.78451413146001  o
   -4.76572488013335     -1.67551810949494     -1.58797087759328  c
   -0.15061495158492     -2.18614667480844     -2.60227003662941  c
    etc...
    9.21060449992324     -2.77968508411378      0.71587738888748  h
    5.87109372056745     -2.67040600177892      0.54514819243204  h
    7.70747476642116     -1.85827163328137     -2.12317155170529  h
    3.16053583847830      1.75657003778612      4.21784993053015  h
    3.20523873898751      2.06642906155866      6.03962166222879  o
    3.84518636016769      0.52341324778083      6.76769535558585  h
$intdef
# definitions of internal coordinates
   1 f  1.0000000000000 stre    6   21           val=   2.05908
   2 f  1.0000000000000 stre    6   53           val=   2.07110
   3 f  0.0463401612403 bend   53   21    6      val=   1.20720
        0.5016372600998 bend    7   21    6
        0.4983829790270 bend    7   53    6
  etc...

The fs should only be deleted after the keyword $coord but not after any other keyword. Also nothing else should be removed. Just the fs. So I can just find the keyword $coord and stop deleting the fs after the next keyword. I tried to do this in bash. I figured out that I can delete the last column with awk:

awk 'NF{NF=-1};1'

And I can find the lines where f is with sed:

sed -n '/$coord/,/\$/{/$coord/!{/\$/!p}}'

I tried to make a script but I am not able to figure out how I could use these to get the correct output. Or is there some easier way to do this? Can anyone help?

CodePudding user response:

You can use awk for that:

awk -v RS='$' -v ORS='$' '/^coord\n/ {gsub(/ f\n/,"\n")} 1' file

CodePudding user response:

This might work for you (GNU sed):

sed '/^$./h;G;/^$coord/Ms/f//g;P;d' file

Store a keyword in the hold space.

Append the hold space to each line and if the hold space begins with $coord remove any f's from that line.

Print the first line in the pattern space and then delete everything.

Thus when a keyword appears, each line thereafter will belong to that keyword until the keyword changes.

The P command allows for the current line to printed as is and not the introduced artifact.

CodePudding user response:

In sed, you can easily restrict a command to a range of lines. The only slight difficulty here is that the typical command will make the replacement on the matching lines, and the easiest way here to select the range of lines is with the two headers where the replacement should not happen. So you can either restrict the line range to exclude the headers, or you can extend the string to be replaced from merely f at the end of the line to f at the end of the line. You probably want to delete the space anyway, and maybe your headers can contain spaces, so it's not clear which of the following solutions is appropriate:

 sed '/\$coord/,/^[$]/{/^[^$]/s/f$//;}'
 sed '/\$coord/,/^[$]/s/ f$//'

The second solution is slightly simpler, but less robust since it will fail if either of the keywords ends with f. If your keywords cannot contain whitespace, this may not be a concern. The first solution only removes the f from the end of the line, and the leading space will be retained. That is consistent with the description you've given, but perhaps not the behavior you actually want.

Either solution is doing basically the same thing; apply the command s/f$// on a restricted range of lines. That command searches for the pattern f$ (or f$) which matches an f at the end of the line. In each command, the original address range is specified as the lines between (and including) the line which matches $coord (the backslash causes sed to match a literal $ rather than matching the end of line) and the next line that starts with a literal $. The first solution includes a second address range that prevents the command from being applied on a line matching the keyword.

CodePudding user response:

Using any awk:

$ awk '/^\$/{ key=$0 } key=="$coord"{ sub(/ f$/,"") } 1' file
$coord
   -6.81387808414325      5.82189470091282     -1.45477353169903  c
    3.12250219010826      1.39239934150351      0.78451413146001  o
   -4.76572488013335     -1.67551810949494     -1.58797087759328  c
   -0.15061495158492     -2.18614667480844     -2.60227003662941  c
    etc...
    9.21060449992324     -2.77968508411378      0.71587738888748  h
    5.87109372056745     -2.67040600177892      0.54514819243204  h
    7.70747476642116     -1.85827163328137     -2.12317155170529  h
    3.16053583847830      1.75657003778612      4.21784993053015  h
    3.20523873898751      2.06642906155866      6.03962166222879  o
    3.84518636016769      0.52341324778083      6.76769535558585  h
$intdef
# definitions of internal coordinates
   1 f  1.0000000000000 stre    6   21           val=   2.05908
   2 f  1.0000000000000 stre    6   53           val=   2.07110
   3 f  0.0463401612403 bend   53   21    6      val=   1.20720
        0.5016372600998 bend    7   21    6
        0.4983829790270 bend    7   53    6
  etc...

By the way, in your question you said:

I figured out that I can delete the last column with awk:

awk 'NF{NF=-1};1'

but setting NF to a negative number is a semantic error so I suppose you probably meant:

awk 'NF{NF-=1};1'

but the effect of decrementing NF is undefined behavior per POSIX and even if it does delete the final field (as it will in some awks but not in others), that will trash the spacing of your input by converting all chains of spaces to single blanks. So, I wouldn't do that.

  • Related