How to remove newlines between lines if next line not starting with a comma?-CodePudding

I've exported into a text file all of my text messages and they are formatted as such.

, NAME  18001112222, RECV, Text message contents.
, NAME  18001112222, RECV, Text message contents that are run over to 
the line below it.
, NAME  18001112222, SENT, Text message contents that have

multiple lines and empty lines!
, NAME  18001112222, SENT, Text Message contents

I know how to remove the empty lines. How would one use awk, sed or grep to move all of these lines, that don't begin with a , to the end of the line above it?

Or how would you reformat this to ensure each text message has all of its contents on a single line.

I haven't tried anything yet, because Im unsure where to even begin, thats why I'm here asking for more practiced hands to hopefully provide some practical examples on how to going about solving this issue. Thanks in Advance!

CodePudding user response：

Same idea as https://stackoverflow.com/a/73030681/10971581 :

awk -v ORS= '
    NR>1 && /^, / { print "\n" }
    1;
    END { print "\n" }
' inputfile

The input seems to be malformed CSV. One would normally expect fields that could contain newlines or the field delimiter (, ) to be quoted.

Note that it is impossible in general to determine if a line that starts with , is a continuation or intended to start a new line. The code above assumes it is always the latter.

CodePudding user response：

I would harness GNU AWK for this task following way, let file.txt content be

, NAME  18001112222, RECV, Text message contents.
, NAME  18001112222, RECV, Text message contents that are run over to 
the line below it.
, NAME  18001112222, SENT, Text message contents that have

multiple lines and empty lines!
, NAME  18001112222, SENT, Text Message contents

then

awk 'BEGIN{RS="\n,"}{ORS=RT;gsub(/\n/," ");print}' file.txt

gives output

, NAME  18001112222, RECV, Text message contents.
, NAME  18001112222, RECV, Text message contents that are run over to  the line below it.
, NAME  18001112222, SENT, Text message contents that have  multiple lines and empty lines!
, NAME  18001112222, SENT, Text Message contents

Explanation: I inform GNU AWK that row separator (RS) is newline (\n) followed by comma (,) then for each line I set output row separator (ORS) is current row terminator (RT) then replace all newlines (\n) in rows by space (depending on your requirement you might need alter that to empty string) then I print row which is suffixed by row terminator.

(tested in GNU Awk 5.0.1)

CodePudding user response：

Using GNU sed

$ sed -E ':a;/^,/{N;s/ *\n($|[a-z])/ \1/;ba}' input_file
, NAME  18001112222, RECV, Text message contents.
, NAME  18001112222, RECV, Text message contents that are run over to  the line below it.
, NAME  18001112222, SENT, Text message contents that have  multiple lines and empty lines!
, NAME  18001112222, SENT, Text Message contents

CodePudding user response：

This might work for you (GNU sed):

sed ':a;N;/\n$\|\n[^,]/s/\n//;ta;P;D' file

Append the next line and if it is empty or does not begin with ,, remove the newline and go again. Otherwise, print/delete the first line and go again.

N.B.The D command inhibits the automatic replenishing of the pattern space with the next line when the pattern space is not empty i.e. stuff is left over from before.

CodePudding user response：

You could use a Perl One-Liner (RexEgg explanation).

perl -0777 -pe 's/\n (?!,)/ /g;' yourfile

Here is a demo at regex101 or a bash demo at tio.run

This replaces one or more \n newlines with space if not followed by a comma.
To prevent removing newlines at the string end, modify the lookahead: (?!$|,)

CodePudding user response：

GNU sed with th -z option:

sed -rz ':a;s/\n([^,])/\1/g;ta' inputfile