I am trying to replace a pattern between the lines of a file.
Specifically, I would like to replace ,\n &
with , &\n
in large and multiple files. This actually moves the symbol & to the previous line. This is very easy with CTR H, but I found it difficult with sed.
So, the initial file is in the following form:
A ,
& B -,
& C ),
& D ,
& E (,
& F *,
# & G -,
& H ,
& I (,
& J ,
K ?,
The output-desired form is:
A , &
B -, &
C ), &
D , &
E (, &
F *, &
# & G -,
H , &
I (, &
J ,
K ?,
Following previous answered questions on stackoverflow, I tried to convert it with the commands below:
sed ':a;N;$!ba;s/,\n &/&\n /g' file1.txt > file2.txt
sed -i -e '$!N;/&/b1' -e 'P;D' -e:1 -e 's/\n[[:space:]]*/ /' file2.txt
but they fail if the symbol "#" is present in the file.
Is there any way to replace the matched pattern simpler, let's say:
sed -i 's/,\n &/, &\n /g' file
Thank you in advance!
CodePudding user response:
Using sed
$ sed ':a;N;s/\n \ \(&\) \(.*\)/ \1\n \2/;ba' input_file
A , &
B -, &
C ), &
D , &
E (, &
F *,
# & G -, &
H , &
I (, &
J ,
CodePudding user response:
If you use GNU sed
and your file does not contain NUL characters (ASCII code 0), you can use its -z
option to process the whole file as one single string:
$ sed -Ez ':a;s/((\`|\n)[^\n#]*,)((\n[^\n#]*#[^\n]*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /g;ta' file
A , &
B -, &
C ), &
D , &
E (, &
F *, &
# & G -,
H , &
I (, &
J ,
K ?,
This corresponds to your textual specification and to your desired output for the example you show. But it is a bit complicated. Instead of processing lines that end with a newline character it processes sub-strings that begin with a newline character (or the beginning of the file) and end before the next newline character. Let's name these "chunks".
We basically search for a sequence of chunks in the form AB*C
where A
is a chunk (possibly the first) not containing #
, B*
is any number (including none) of chunks containing #
, and C
is a chunk starting with a newline, followed by spaces and &
.
A
is matched by (\<backstick>|\n)[^\n#]*,
which means beginning-of-file-or-newline, followed by any number of characters expect newline and #
, followed by a comma.
B
is matched by \n[^\n#]*#[^\n]*
which means newline, followed by any number of characters expect newline and #
, followed by #
and any number of characters expect newline.
C
is matched by \n[[:blank:]]*
which means newline, followed by any number of blanks and a &
.
If we find such a sequence we add a space and a &
at the end of A
, we do not change B*
, and we replace the first &
in C
by a space.
And we repeat until no such sequence is found.