I can't seem to find any way of sorting a word based on its characters in awk. For example if the word is "hello" then its sorted equivalent is "ehllo". how to achieve this in awk ?
CodePudding user response:
With GNU awk for PROCINFO[]
, "sorted_in" (see https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning) and splitting with a null separator resulting in an array of chars:
$ echo 'hello' |
awk '
BEGIN { PROCINFO["sorted_in"]="@val_str_asc" }
{
split($1,chars,"")
word = ""
for (i in chars) {
word = word chars[i]
}
print word
}
'
ehllo
$ echo 'hello' | awk -v ordr='@val_str_asc' 'BEGIN{PROCINFO["sorted_in"]=ordr} {split($1,chars,""); word=""; for (i in chars) word=word chars[i]; print word}'
ehllo
$ echo 'hello' | awk -v ordr='@val_str_desc' 'BEGIN{PROCINFO["sorted_in"]=ordr} {split($1,chars,""); word=""; for (i in chars) word=word chars[i]; print word}'
ollhe
CodePudding user response:
Another option is a Decorate-Sort-Undecorate with sed
. Essentially, you use sed
to break "hello"
into one character per-line (decorating each character with a newline '\n'
) and pipe the result to sort
. You then use sed
to do the reverse (undecorate each line by removing the '\n'
) to join the lines back together.
printf "hello" | sed 's/\(.\)/\1\n/g' | sort | sed '{:a N;s/\n//;ta}'
ehllo
There are several approaches you can use, but this one is shell friendly, but the behavior requires GNU sed
.
CodePudding user response:
This would be more doable with gawk, which includes the asort
function to sort an array:
awk 'BEGIN{FS=OFS=ORS=""}{split($0,a);asort(a);for(i in a)print a[i]}'<<<hello
This outputs:
ehllo
Demo: https://ideone.com/ylWQLJ
CodePudding user response:
You need to write a function to sort letters in a word (see : https://www.gnu.org/software/gawk/manual/html_node/Join-Function.html):
function siw(word, result, arr, arrlen, arridx) {
split(word, arr, "")
arrlen = asort(arr)
for (arridx = 1; arridx <= arrlen; arridx ) {
result = result arr[arridx]
}
return result
}
And define a sort sub-function to compare two words (see : https://www.gnu.org/software/gawk/manual/html_node/Array-Sorting-Functions.html):
function compare_by_letters(i1, v1, i2, v2, left, right) {
left = siw(v1)
right = siw(v2)
if (left < right)
return -1
else if (left == right)
return 0
else
return 1
}
And use this function with awk sort function:
asort(array_test, array_test_result, "compare_by_letters")
Then, the sample program is:
function siw(word, result, arr, arrlen, arridx) {
result = hash_word[word]
if (result != "") {
return result
}
split(word, arr, "")
arrlen = asort(arr)
for (arridx = 1; arridx <= arrlen; arridx ) {
result = result arr[arridx]
}
hash_word[word] = result
return result
}
function compare_by_letters(i1, v1, i2, v2, left, right) {
left = siw(v1)
right = siw(v2)
if (left < right)
return -1
else if (left == right)
return 0
else
return 1
}
{
array_test[i ] = $0
}
END {
alen = asort(array_test, array_test_result, "compare_by_letters")
for (aind = 1; aind <= alen; aind ) {
print array_test_result[aind]
}
}
Executed like this:
echo -e "fail\nhello\nborn" | awk -f sort_letter.awk
Output:
fail
born
hello
Of course, if you have a big input, you could adapt siw
function to memorize result for fastest compute:
function siw(word, result, arr, arrlen, arridx) {
result = hash_word[word]
if (result != "") {
return result
}
split(word, arr, "")
arrlen = asort(arr)
for (arridx = 1; arridx <= arrlen; arridx ) {
result = result arr[arridx]
}
hash_word[word] = result
return result
}