Home > Net >  Substite newlines with a string with awk
Substite newlines with a string with awk

Time:01-23

I need to parse stdin in the following way:

(1) all newlines characters must be substituted with \n (a literal \ followed by n)

(2) nothing else should be performed except the previous

I chose awk to do it, and I would like an answer that uses awk if possible.

I came up with:

echo -ne "A\nB\nC" | awk '{a[NR]=$0;} END{for(i=1;i<NR;i  ){printf "%s\\n",a[i];};printf "%s",a[NR];}'

But it looks cumbersome.

Is there a better / cleaner way?

CodePudding user response:

With awk:

echo -ne "A\nB\nC" | awk 'BEGIN{FS="\n"; OFS="\\n"; RS=ORS=""} {$1=$1}1'

Output:

A\nB\nC

See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

CodePudding user response:

  • Handling malformed files (ie. that don't end with the record separator) with awk is tricky.

  • sed -z is GNU specific, and has the side effect of slurping the whole (text) file into RAM (that might be an issue for huge files)

Thus, for a robust and reasonably portable solution I would use perl:

perl -pe 's/\n/\\n/'

CodePudding user response:

Using GNU awk for multi-char RS:

$ echo -ne "A\nB\n\nC" | awk -v RS='^$' -v ORS= -F'\n' -v OFS='\\n' '{$1=$1} 1'
A\nB\n\nC$

You need to use GNU awk for this as no other awk will tell you if the input ended with \n or not and so no other awk

CodePudding user response:

I would harness GNU AWK for this task following way

echo -ne "A\nB\nC" | awk '{printf "%s%s",$0,RT?"\\n":""}'

gives output

A\nB\nC

(without trailing newline)

Explanation: I do create string to be output based on current line context ($0) and backslash followed by n or empty string depending on RT which is row terminator for current line. RT value is newline for all but last lines and empty string for last line, therefore when used in boolean context it is true for all but last line. I used so-called ternary operator here condition?valueiftrue:valueiffalse.

(tested in GNU Awk 5.0.1)

  • Related