Home > Software design >  Replace a pattern between lines
Replace a pattern between lines

Time:05-13

I am trying to replace a pattern between the lines of a file.

Specifically, I would like to replace ,\n & with , &\n in large and multiple files. This actually moves the symbol & to the previous line. This is very easy with CTR H, but I found it difficult with sed.

So, the initial file is in the following form:

      A  ,
   &  B -,
   &  C ),
   &  D  ,
   &  E (,
   &  F *,
 # &  G -,
   &  H  ,
   &  I (,
   &  J  ,
      K ?,

The output-desired form is:

      A  , &
      B -, &
      C ), &
      D  , &
      E (, &
      F *, &
#  &  G -,
      H  , &
      I (, &
      J  ,
      K ?,

Following previous answered questions on stackoverflow, I tried to convert it with the commands below:

sed ':a;N;$!ba;s/,\n &/&\n /g' file1.txt > file2.txt

sed -i -e '$!N;/&/b1' -e 'P;D' -e:1 -e 's/\n[[:space:]]*/ /' file2.txt

but they fail if the symbol "#" is present in the file.

Is there any way to replace the matched pattern simpler, let's say: sed -i 's/,\n &/, &\n /g' file

Thank you in advance!

CodePudding user response:

Using sed

$ sed ':a;N;s/\n \ \(&\) \(.*\)/ \1\n     \2/;ba' input_file
      A  , &
      B -, &
      C ), &
      D  , &
      E (, &
      F *,
 # &  G -, &
      H  , &
      I (, &
      J  ,

CodePudding user response:

If you use GNU sed and your file does not contain NUL characters (ASCII code 0), you can use its -z option to process the whole file as one single string:

$ sed -Ez ':a;s/((\`|\n)[^\n#]*,)((\n[^\n#]*#[^\n]*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /g;ta' file
      A  , &
      B -, &
      C ), &
      D  , &
      E (, &
      F *, &
 # &  G -,
      H  , &
      I (, &
      J  ,
      K ?,

This corresponds to your textual specification and to your desired output for the example you show. But it is a bit complicated. Instead of processing lines that end with a newline character it processes sub-strings that begin with a newline character (or the beginning of the file) and end before the next newline character. Let's name these "chunks".

We basically search for a sequence of chunks in the form AB*C where A is a chunk (possibly the first) not containing #, B* is any number (including none) of chunks containing #, and C is a chunk starting with a newline, followed by spaces and &.

A is matched by (\<backstick>|\n)[^\n#]*, which means beginning-of-file-or-newline, followed by any number of characters expect newline and #, followed by a comma.

B is matched by \n[^\n#]*#[^\n]* which means newline, followed by any number of characters expect newline and #, followed by # and any number of characters expect newline.

C is matched by \n[[:blank:]]* which means newline, followed by any number of blanks and a &.

If we find such a sequence we add a space and a & at the end of A, we do not change B*, and we replace the first & in C by a space.

And we repeat until no such sequence is found.

  • Related