Home > Net >  change optimal base of radix/counting sort?
change optimal base of radix/counting sort?

Time:09-30

Im learning to implement radix sort but im having trouble understanding how to change to a base different from the optimal base log(n) on base 10 to 2^log(n), with a floor function on the ln. Im using this code from GeeksForGeeks which tend to follow the CLRS pseudocodes, to learn:

// C   implementation of Radix Sort
 
#include <iostream>
using namespace std;
 
// A utility function to get maximum value in arr[]
int getMax(int arr[], int n)
{
    int mx = arr[0];
    for (int i = 1; i < n; i  )
        if (arr[i] > mx)
            mx = arr[i];
    return mx;
}
 
// A function to do counting sort of arr[] according to
// the digit represented by exp.
void countSort(int arr[], int n, int exp)
{
    int output[n]; // output array
    int i, count[10] = { 0 };
 
    // Store count of occurrences in count[]
    for (i = 0; i < n; i  )
        count[(arr[i] / exp) % 10]  ;
 
    // Change count[i] so that count[i] now contains actual
    //  position of this digit in output[]
    for (i = 1; i < 10; i  )
        count[i]  = count[i - 1];
 
    // Build the output array
    for (i = n - 1; i >= 0; i--) {
        output[count[(arr[i] / exp) % 10] - 1] = arr[i];
        count[(arr[i] / exp) % 10]--;
    }
 
    // Copy the output array to arr[], so that arr[] now
    // contains sorted numbers according to current digit
    for (i = 0; i < n; i  )
        arr[i] = output[i];
}
 
// The main function to that sorts arr[] of size n using
// Radix Sort
void radixsort(int arr[], int n)
{
    // Find the maximum number to know number of digits
    int m = getMax(arr, n);
 
    // Do counting sort for every digit. Note that instead
    // of passing digit number, exp is passed. exp is 10^i
    // where i is current digit number
    for (int exp = 1; m / exp > 0; exp *= 10)
        countSort(arr, n, exp);
}
 
// A utility function to print an array
void print(int arr[], int n)
{
    for (int i = 0; i < n; i  )
        cout << arr[i] << " ";
}
 
// Driver Code
int main()
{
    int arr[] = { 170, 45, 75, 90, 802, 24, 2, 66 };
    int n = sizeof(arr) / sizeof(arr[0]);
     
      // Function Call
      radixsort(arr, n);
    print(arr, n);
    return 0;
}

My main problem is that i dont know what 2^log(n) mean on this function, like i understand how to change it to binary base or hexadecimal base, but how does this base change the algorithm compared to base 10 or any of the other ones? why is this base considered close to the optimal one? And how would i even implement it?

CodePudding user response:

The base is the factor by which you would divide a number repeatedly to identify its digits in that base.

For instance when base is 10, and we have the number 12, then we can identify the least significant digit by dividing it by the base, which gives us 1 with remainder 2. That remainder is the least significant digit. Then to get the next digit (in right-to-left sense), we divide the previous quotient again by the base, so in the example we get 0 with remainder 1. This time the digit is 1, and the process stops because the quotient is 0.

If we change the base to 2 for analysing the same number 12, then we identify the least significant (binary) digit by applying a division by 2:

  • 12 / 2 = 6, remainder 0
  • 6 / 2 = 3, remainder 0
  • 3 / 2 = 1, remainder 1
  • 1 / 2 = 0, remainder 1

Reading the remainders from bottom to top, we get 1100, which is the binary representation of 12.

This is to say that in your algorithm, you can easily change the base to another base by modifying the occurrences of 10 where they appear in % 10, / 10, * 10, or in the number of the "buckets" (the size of the count array).

So here is your code altered so that it defines a BASE, and you can just change it to whichever base you like:

#define BASE 2

void countSort(int arr[], int n, int exp)
{
    int output[n];
    int i, count[BASE] = { 0 }; // The number of buckets is BASE
 
    for (i = 0; i < n; i  )
        count[(arr[i] / exp) % BASE]  ; // Division is by BASE
 
    for (i = 1; i < BASE; i  ) // The number of buckets is BASE
        count[i]  = count[i - 1];
 
    for (i = n - 1; i >= 0; i--) {
        // Division is by BASE
        output[count[(arr[i] / exp) % BASE] - 1] = arr[i];
        count[(arr[i] / exp) % BASE]--;
    }
 
    for (i = 0; i < n; i  )
        arr[i] = output[i];
}
 
void radixsort(int arr[], int n)
{
    int m = getMax(arr, n);
    // Power has BASE as base:
    for (int exp = 1; m / exp > 0; exp *= BASE)
        countSort(arr, n, exp);
}

As to what is most efficient: in terms of time complexity it doesn't make a difference which base you choose. If however there is more information about the input than just that they are integers, then it might be that for certain ranges and distributions of input one base may in practice give faster results than another.

  • Related