Home > Software design >  How to sort 2 arrays in bash
How to sort 2 arrays in bash

Time:11-16

I want to sort 2 arrays at the same time. The arrays are the following: wordArray and numArray. Both are global.

These 2 arrays contain all the words (without duplicates) and the number of the appearances of each word from a text file.

Right now I am using Bubble Sort to sort both of them at the same time:

# Bubble Sort function
function bubble_sort {   
    local max=${#numArray[@]}
    size=${#numArray[@]}
    while ((max > 0))
    do
        local i=0
        while ((i < max))
        do
            if [ "$i" != "$(($size-1))" ] 
            then
                if [ ${numArray[$i]} \< ${numArray[$((i   1))]} ]
                   then
                   local temp=${numArray[$i]}
                   numArray[$i]=${numArray[$((i   1))]}
                   numArray[$((i   1))]=$temp
                    
                   local temp2=${wordArray[$i]}
                   wordArray[$i]=${wordArray[$((i   1))]}
                  wordArray[$((i   1))]=$temp2
                 fi
             fi
            ((i  = 1))
        done
        ((max -= 1))
    done
}

#Calling Bubble Sort function
bubble_sort "${numArray[@]}" "${wordArray[@]}"

But for some reason it won't sort them properly when large arrays are in place.

Does anyone knows what's wrong with it or an other approach to sort the words with the corresponding number of appearance with or without arrays?

This:

wordArray = (because, maybe, why, the)
numArray = (5, 12, 20, 13)

Must turn to this:

wordArray = (why, the, maybe, because)
numArray = (20, 13, 12, 5)

Someone recommended to write the two arrays side by side in a text file and sort the file.

How will it work for this input:

1 Arthur
21 Zebra

to turn to this output:

21 Zebra
1 Arthur

CodePudding user response:

Assuming the arrays no not contain tab character or newline character, how about:

#!/bin/bash

wordArray=(why the maybe because)
numArray=(20 13 12 5)

tmp1=$(mktemp tmp.XXXXXX)                               # file to be sorted
tmp2=$(mktemp tmp.XXXXXX)                               # sorted result

for (( i = 0; i < ${#wordArray[@]}; i   )); do
    echo "${numArray[i]}"$'\t'"${wordArray[i]}"         # write the number and word delimited by a tab character
done > "$tmp1"

sort -nrk1,1 "$tmp1" > "$tmp2"                          # sort the file by number in descending order

while IFS=$'\t' read -r num word; do                    # read the lines splitting by the tab character
    numArray_sorted =("$num")                           # add the number to the array
    wordArray_sorted =("$word")                         # add the word to the array
done < "$tmp2"

rm -- "$tmp1"                                           # unlink the temp file
rm -- "$tmp2"                                           # same as above

echo "${wordArray_sorted[@]}"                           # same as above
echo "${numArray_sorted[@]}"                            # see the result

Output:

why the maybe because
20 13 12 5

If you prefer not to create temp files, here is the process substitution version, which will run faster without writing/reading temp files.

#!/bin/bash

wordArray=(why the maybe because)
numArray=(20 13 12 5)

while IFS=$'\t' read -r num word; do
    numArray_sorted =("$num")
    wordArray_sorted =("$word")
done < <(
    sort -nrk1,1 < <(
        for (( i = 0; i < ${#wordArray[@]}; i   )); do
            echo "${numArray[i]}"$'\t'"${wordArray[i]}"
        done
    )
)

echo "${wordArray_sorted[@]}"
echo "${numArray_sorted[@]}"

Or simpler (using the suggestion by KamilCuk):

#!/bin/bash

wordArray=(why the maybe because)
numArray=(20 13 12 5)

while IFS=$'\t' read -r num word; do
    numArray_sorted =("$num")
    wordArray_sorted =("$word")
done < <(
    paste <(printf "%s\n" "${numArray[@]}") <(printf "%s\n" "${wordArray[@]}") | sort -nrk1,1
)

echo "${wordArray_sorted[@]}"
echo "${numArray_sorted[@]}"

CodePudding user response:

You need numeric sort for the numbers. You can sort an array like this:

mapfile -t wordArray <(printf '%s\n' "${wordArray[@]}" | sort -n)

But what you actually need is something like:

for num in "${numArray[@]}"; do
    echo "$num: ${wordArray[j  ]}"
done |
sort -n k1,1

But, earlier in the process, you should have used only one array, where the word and frequency (or vice versa) are key value pairs. Then they always have a direct relationship, and can be printed similarly to the for loop above.

  •  Tags:  
  • bash
  • Related