How to sort the characters of a word using awk?-CodePudding

I can't seem to find any way of sorting a word based on its characters in awk. For example if the word is "hello" then its sorted equivalent is "ehllo". how to achieve this in awk ?

CodePudding user response：

With GNU awk for PROCINFO[], "sorted_in" (see https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning) and splitting with a null separator resulting in an array of chars:

$ echo 'hello' |
awk '
    BEGIN { PROCINFO["sorted_in"]="@val_str_asc" }
    {
        split($1,chars,"")
        word = ""
        for (i in chars) {
            word = word chars[i]
        }
        print word
    }
'
ehllo

$ echo 'hello' | awk -v ordr='@val_str_asc' 'BEGIN{PROCINFO["sorted_in"]=ordr} {split($1,chars,""); word=""; for (i in chars) word=word chars[i]; print word}'
ehllo

$ echo 'hello' | awk -v ordr='@val_str_desc' 'BEGIN{PROCINFO["sorted_in"]=ordr} {split($1,chars,""); word=""; for (i in chars) word=word chars[i]; print word}'
ollhe

CodePudding user response：

Another option is a Decorate-Sort-Undecorate with sed. Essentially, you use sed to break "hello" into one character per-line (decorating each character with a newline '\n') and pipe the result to sort. You then use sed to do the reverse (undecorate each line by removing the '\n') to join the lines back together.

printf "hello" | sed 's/\(.\)/\1\n/g' | sort | sed '{:a N;s/\n//;ta}'
ehllo

There are several approaches you can use, but this one is shell friendly, but the behavior requires GNU sed.

CodePudding user response：

This would be more doable with gawk, which includes the asort function to sort an array:

awk 'BEGIN{FS=OFS=ORS=""}{split($0,a);asort(a);for(i in a)print a[i]}'<<<hello

This outputs:

ehllo

Demo: https://ideone.com/ylWQLJ

CodePudding user response：

You need to write a function to sort letters in a word (see : https://www.gnu.org/software/gawk/manual/html_node/Join-Function.html):

function siw(word,        result, arr, arrlen, arridx) {
    split(word, arr, "")
    arrlen = asort(arr)
    for (arridx = 1; arridx <= arrlen; arridx  ) {
        result = result arr[arridx]
    }
    return result
}

And define a sort sub-function to compare two words (see : https://www.gnu.org/software/gawk/manual/html_node/Array-Sorting-Functions.html):

function compare_by_letters(i1, v1, i2, v2,        left, right) {
    left  = siw(v1)
    right = siw(v2)
    if (left < right)
        return -1
    else if (left == right)
        return 0
    else
        return 1
}

And use this function with awk sort function:

asort(array_test, array_test_result, "compare_by_letters")

Then, the sample program is:

function siw(word,        result, arr, arrlen, arridx) {
    result = hash_word[word]
    if (result != "") {
        return result
    }
    split(word, arr, "")
    arrlen = asort(arr)
    for (arridx = 1; arridx <= arrlen; arridx  ) {
        result = result arr[arridx]
    }
    hash_word[word] = result
    return result
}

function compare_by_letters(i1, v1, i2, v2,        left, right) {
    left  = siw(v1)
    right = siw(v2)
    if (left < right)
        return -1
    else if (left == right)
        return 0
    else
        return 1
}

{
    array_test[i  ] = $0
}

END {
    alen = asort(array_test, array_test_result, "compare_by_letters")
    for (aind = 1; aind <= alen; aind  ) {
        print array_test_result[aind]
    }
}

Executed like this:

echo -e "fail\nhello\nborn" | awk -f sort_letter.awk

Output:

fail
born
hello

Of course, if you have a big input, you could adapt siw function to memorize result for fastest compute:

function siw(word,        result, arr, arrlen, arridx) {
    result = hash_word[word]
    if (result != "") {
        return result
    }
    split(word, arr, "")
    arrlen = asort(arr)
    for (arridx = 1; arridx <= arrlen; arridx  ) {
        result = result arr[arridx]
    }
    hash_word[word] = result
    return result
}