I have a text file that contains data that I am trying to make more easily readable. Some of the lines, e.g. info 2 below, have the results over multiple lines, where there is a line break followed by a number of spaces (see below).
info 1 : holiday
info 2: today the weather is very \n\r
hot
I would like to remove all line breaks where there is a line break followed by a space. I have tried using
tr '\n\r ' ' ' < test.txt
but this removes all line endings. Is there a way to remove only those line endings followed by a space? I have quite a number of small files which I want to loop over.
Thanks in advance for any help!
CodePudding user response:
tr
is for TRanslating characters. It replaces characters in the first set for characters in the second set. This is a set of characters, the order of characters in the set do not matter (that much) for tr.
Is there a way
Yes, you have to match a newline followed by spaces and remove them. Note that most unix tools work on newlines, you have to use tools that work on the whole file. For example, with GNU sed:
sed -z 's/\n\r //'
CodePudding user response:
try
cat your_file | tr "\r\n" "#" | sed -e "s/# \ / /g" | tr "#" "\n"
replace "#" on any symbol, that not exists in your text
CodePudding user response:
You could use sed
for this
$ sed ':a;N;s/\(\\n\\r\)\?\n \ \(.*\)/\2/;ba' input_file
info 1 : holiday
info 2: today the weather is very hot
CodePudding user response:
Using perl:
perl -p0e 's/ *\r\n / /g' test.txt
CodePudding user response:
You could read all lines into the pattern space, and then match the newline \n\r
and capture at least a single space in a group.
In the replacement the backreference \1
to the captured space.
sed ':a;$!{N;ba};s/\n\r\([[:blank:]]\)/\1/g' file
Output after the replacement:
info 1 : holiday
info 2: today the weather is very hot