Remove line breaks followed by space in text file-CodePudding

I have a text file that contains data that I am trying to make more easily readable. Some of the lines, e.g. info 2 below, have the results over multiple lines, where there is a line break followed by a number of spaces (see below).

info 1 : holiday
info 2: today the weather is very \n\r
       hot

I would like to remove all line breaks where there is a line break followed by a space. I have tried using

tr '\n\r ' '   ' < test.txt

but this removes all line endings. Is there a way to remove only those line endings followed by a space? I have quite a number of small files which I want to loop over.

Thanks in advance for any help!

CodePudding user response：

tr is for TRanslating characters. It replaces characters in the first set for characters in the second set. This is a set of characters, the order of characters in the set do not matter (that much) for tr.

Is there a way

Yes, you have to match a newline followed by spaces and remove them. Note that most unix tools work on newlines, you have to use tools that work on the whole file. For example, with GNU sed:

sed -z 's/\n\r      //'

CodePudding user response：

try

cat your_file | tr "\r\n" "#" | sed -e "s/# \ / /g" | tr "#" "\n"

replace "#" on any symbol, that not exists in your text

CodePudding user response：

You could use sed for this

$ sed ':a;N;s/\(\\n\\r\)\?\n \ \(.*\)/\2/;ba' input_file
info 1 : holiday
info 2: today the weather is very hot

CodePudding user response：

Using perl:

perl -p0e  's/ *\r\n  / /g' test.txt

CodePudding user response：

You could read all lines into the pattern space, and then match the newline \n\r and capture at least a single space in a group.

In the replacement the backreference \1 to the captured space.

sed ':a;$!{N;ba};s/\n\r\([[:blank:]]\)/\1/g' file

Output after the replacement:

info 1 : holiday
info 2: today the weather is very       hot