Home > Mobile >  Sorting paragraphs in bash (awk or sed)?
Sorting paragraphs in bash (awk or sed)?

Time:02-26

To sort various paragraphs (abc order) I tried:

awk 'BEGIN { RS="" } { a[FNR]=$0 } END { PROCINFO["sorted_in"]="@val_str_asc" for (i in a) print a[i] ORS } ' myrecords.txt

But it won't sort. Sample records:

Ham  
this is good  
(mind the mail)

Cheese  
I'm fine

Turkey
(empty)

Blocks of text might got one or more lines, seperated by one or more blank lines or even a date instead of a blank. The latter can be solved by replacing the date with a blank line.

Desired result:

Cheese
I'm fine

Ham 
this is good 
(mind the mail)

Turkey 
(empty)

CodePudding user response:

From the output shown in your comment your lines all end in control-Ms (Carriage Returns) so the ones that look empty actually aren't so your whole file is a single record when RS is null. Run dos2unix or sed 's/\r$//' on your input file to remove those CRs and then run the awk command again. See the difference below before and after I run sed on the input to remove the CRs:

$ cat -Ev file
Ham  ^M$
this is good  ^M$
(mind the mail)^M$
^M$
Cheese  ^M$
I'm fine^M$
^M$
Turkey^M$
(empty)^M$

$ awk -v RS= '{print NR, "<" $0 ">"}' file | cat -Ev
1 <Ham  ^M$
this is good  ^M$
(mind the mail)^M$
^M$
Cheese  ^M$
I'm fine^M$
^M$
Turkey^M$
(empty)^M>$

$ sed 's/\r$//' file > tmp && mv tmp file

$ cat -Ev file
Ham  $
this is good  $
(mind the mail)$
$
Cheese  $
I'm fine$
$
Turkey$
(empty)$

$ awk -v RS= '{print NR, "<" $0 ">"}' file | cat -Ev
1 <Ham  $
this is good  $
(mind the mail)>$
2 <Cheese  $
I'm fine>$
3 <Turkey$
(empty)>$

See Why does my tool output overwrite itself and how do I fix it? for more information on those DOS line endings.

  • Related