I've exported into a text file all of my text messages and they are formatted as such.
, NAME 18001112222, RECV, Text message contents.
, NAME 18001112222, RECV, Text message contents that are run over to
the line below it.
, NAME 18001112222, SENT, Text message contents that have
multiple lines and empty lines!
, NAME 18001112222, SENT, Text Message contents
I know how to remove the empty lines. How would one use awk, sed or grep to move all of these lines, that don't begin with a ,
to the end of the line above it?
Or how would you reformat this to ensure each text message has all of its contents on a single line.
I haven't tried anything yet, because Im unsure where to even begin, thats why I'm here asking for more practiced hands to hopefully provide some practical examples on how to going about solving this issue. Thanks in Advance!
CodePudding user response:
Same idea as https://stackoverflow.com/a/73030681/10971581 :
awk -v ORS= '
NR>1 && /^, / { print "\n" }
1;
END { print "\n" }
' inputfile
The input seems to be malformed CSV. One would normally expect fields that could contain newlines or the field delimiter (,
) to be quoted.
Note that it is impossible in general to determine if a line that starts with ,
is a continuation or intended to start a new line. The code above assumes it is always the latter.
CodePudding user response:
I would harness GNU AWK
for this task following way, let file.txt
content be
, NAME 18001112222, RECV, Text message contents.
, NAME 18001112222, RECV, Text message contents that are run over to
the line below it.
, NAME 18001112222, SENT, Text message contents that have
multiple lines and empty lines!
, NAME 18001112222, SENT, Text Message contents
then
awk 'BEGIN{RS="\n,"}{ORS=RT;gsub(/\n/," ");print}' file.txt
gives output
, NAME 18001112222, RECV, Text message contents.
, NAME 18001112222, RECV, Text message contents that are run over to the line below it.
, NAME 18001112222, SENT, Text message contents that have multiple lines and empty lines!
, NAME 18001112222, SENT, Text Message contents
Explanation: I inform GNU AWK
that row separator (RS
) is newline (\n
) followed by comma (,
) then for each line I set output row separator (ORS
) is current row terminator (RT
) then replace all newlines (\n
) in rows by space (depending on your requirement you might need alter that to empty string) then I print
row which is suffixed by row terminator.
(tested in GNU Awk 5.0.1)
CodePudding user response:
Using GNU sed
$ sed -E ':a;/^,/{N;s/ *\n($|[a-z])/ \1/;ba}' input_file
, NAME 18001112222, RECV, Text message contents.
, NAME 18001112222, RECV, Text message contents that are run over to the line below it.
, NAME 18001112222, SENT, Text message contents that have multiple lines and empty lines!
, NAME 18001112222, SENT, Text Message contents
CodePudding user response:
This might work for you (GNU sed):
sed ':a;N;/\n$\|\n[^,]/s/\n//;ta;P;D' file
Append the next line and if it is empty or does not begin with ,
, remove the newline and go again. Otherwise, print/delete the first line and go again.
N.B.The D
command inhibits the automatic replenishing of the pattern space with the next line when the pattern space is not empty i.e. stuff is left over from before.
CodePudding user response:
You could use a Perl One-Liner (RexEgg explanation).
perl -0777 -pe 's/\n (?!,)/ /g;' yourfile
Here is a demo at regex101 or a bash demo at tio.run
This replaces one or more \n
newlines with space if not followed by a comma.
To prevent removing newlines at the string end, modify the lookahead: (?!$|,)
CodePudding user response:
GNU sed
with th -z
option:
sed -rz ':a;s/\n([^,])/\1/g;ta' inputfile