I would like to sort various paragraphs in a file by alphabetical order according to the first line:
Hampton
this is good
(mind the mail)
Burlington
I'm fine
Greater Yukonshire Fields
(empty)
Those blocks of text might consist of one or more lines, but are seperated by one or more blank lines.
Desired result:
Burlington
I'm fine
Greater Yukonshire Fields
(empty)
Hampton
this is good
(mind the mail)
CodePudding user response:
One GNU awk
idea:
awk 'BEGIN { RS="" }
{ a[FNR]=$0 }
END { PROCINFO["sorted_in"]="@val_str_asc"
for (i in a)
print a[i] ORS
}
' paragraphs
NOTE: requires GNU awk
for PROCINFO["sorted_in"]
This generates:
Burlington
I'm fine
Greater Yukonshire Fields
(empty)
Hampton
this is good
(mind the mail)
CodePudding user response:
Would you please try msort
, which will be available for most Linux distributions:
msort -bwq file
Output:
Burlington
I'm fine
Greater Yukonshire Fields
(empty)
Hampton
this is good
(mind the mail)
Options:
-b
A record is terminated by two or more newlines-w
Sort on the entire text of the record-q
Be quiet - do not chat while working
CodePudding user response:
Using perl
:
$ perl -00 -lne '
push @paras, [ substr($_, 0, index($_, "\n")), $_ ];
END {
for my $para (sort { $a->[0] cmp $b->[0] } @paras) {
print $para->[1]
}
}' input.txt
Burlington
I'm fine
Greater Yukonshire Fields
(empty)
Hampton
this is good
(mind the mail)
The -00
option reads in "paragraph mode" instead of lines, where multiple newlines separate a paragraph. For each paragraph, it extracts the first line and saves it and the paragraph in a list, and then after reading the entire file, sorts based on the first line and prints the paragraphs.
CodePudding user response:
Using awk
:
One way reading linewise:
awk '
{if (NF) a[p]=(a[p] $0 ORS); else p } # Collect
END {asort(a); for (i in a) print a[i]} # Sort and Output
' input.txt
Another way reading paragraphwise:
awk -v RS='\n{2,}' '
{a[FNR]=$0} # Collect
END {asort(a); for (i in a) print a[i] ORS} # Sort and Output
' input.txt
Output
Burlington
I'm fine
Greater Yukonshire Fields
(empty)
Hampton
this is good
(mind the mail)
Both collect concatenated lines in an array. This is then sorted and output.
CodePudding user response:
An approach using ruby
.
- First initialize a counter
i
and a 2-dimensional arrayarr
, then append the lines$_
- If it finds an empty line increment the counter
- Append a newline to the last paragraph (last line didn't have one)
- Finally print the sorted array
% ruby -ne 'i ||= 0; arr ||= []; arr[i] ||= []; arr[i] << $_
i = 1 if $_.length == 1
END{ arr[i] << ""
puts arr.sort }' file
Burlington
I'm fine
Greater Yukonshire Fields
(empty)
Hampton
this is good
(mind the mail)
CodePudding user response:
Using any awk sort and assuming you dont have any \r
s in your data:
$ awk -v RS= -F'\n' -v OFS='\r' '{$1=$1}1' file |
sort |
awk -v ORS='\n\n' -F'\r' -v OFS='\n' '{$1=$1}1'
Burlington
I'm fine
Greater Yukonshire Fields
(empty)
Hampton
this is good
(mind the mail)
We're just joining lines of each paragraph together with the first awk, then sorting it, then breaking the lines apart again:
$ awk -v RS= -F'\n' -v OFS='\r' '{$1=$1}1' file | cat -Ev
Hampton ^Mthis is good ^M(mind the mail)$
Burlington ^MI'm fine$
Greater Yukonshire Fields ^M(empty)$
$ awk -v RS= -F'\n' -v OFS='\r' '{$1=$1}1' file | sort | cat -Ev
Burlington ^MI'm fine$
Greater Yukonshire Fields ^M(empty)$
Hampton ^Mthis is good ^M(mind the mail)$
The pipe to cat -Ev
is just so you can see the otherwise invisible CR
aka \r
aka ^M
s.