What are the best sorting algorithms when 'n' is very small?-CodePudding

In the critical path of my program, I need to sort an array (specifically, a C std::vector<int64_t>, using the gnu c standard libray). I am using the standard library provided sorting algorithm (std::sort), which in this case is introsort.

I was curious about how well this algorithm performs, and when doing some research on various sorting algorithms different standard and third party libraries use, almost all of them care about cases where 'n' tends to be the dominant factor.

In my specific case though, 'n' is going to be on the order of 2-20 elements. So the constant factors could actually be dominant. And things like cache effects might be very different when the entire array we are sorting fits into a couple of cache lines.

What are the best sorting algorithms for cases like this where the constant factors likely overwhelm the asymptotic factors? And do there exist any vetted C implementations of these algorithms?

CodePudding user response：

Insertion sort or selection sort are both typically faster for small arrays (i.e., fewer than 10-20 elements).

CodePudding user response：

It's impossible to know the fastest way to do anything without knowing exactly what the "anything" is.

Here is one possible set of assumptions:

We don't have any knowledge of the element structure except that elements are comparable. We have no useful way to group them into bins (for radix sort), we must implement a comparison-based sort, and comparison takes place in an opaque manner.
We have no information about the initial state of the input; any input order is equally likely.
We don't have to care about whether the sort is stable.
The input sequence is a simple array. Accessing elements is constant-time, as is swapping them. Furthermore, we will benchmark the function purely according to the expected number of comparisons - not number of swaps, wall-clock time or anything else.

With that set of assumptions (and possibly some other sets), the best algorithms for small numbers of elements will be hand-crafted sorting networks, tailored to the exact length of the input array. (These always perform the same number of comparisons; it isn't feasible to "short-circuit" these algorithms conditionally because the "conditions" would depend on detecting data that is already partially sorted, which still requires comparisons.)

For a network sorting four elements (in the known-optimal five comparisons), this might look like (I did not test this):

template<class RandomIt, class Compare>
void _compare_and_swap(RandomIt first, Compare comp, int x, int y) {
    if (comp(first[x], first[y])) {
        auto tmp = first[x];
        arr[x] = arr[y];
        arr[y] = tmp;
    }
}

// Assume there are exactly four elements available at the `first` iterator.
template<class RandomIt, class Compare>
void network_sort_4(RandomIt first, Compare comp) {
    _compare_and_swap(2, 0);
    _compare_and_swap(1, 3);
    _compare_and_swap(0, 1);
    _compare_and_swap(2, 3);
    _compare_and_swap(1, 2);
}

In real-world environments, of course, we will have different assumptions. For small numbers of elements, with real data (but still assuming we must do comparison-based sorts) it will be difficult to beat naive implementations of insertion sort (or bubble sort, which is effectively the same thing) that have been compiled with good optimizations. It's really not feasible to reason about these things by hand, considering both the complexity of the hardware level (e.g. the steps it takes to pipeline instructions and then compensate for branch mis-predictions) and the software level (e.g. the relative cost of performing the swap vs. performing the comparison, and the effect that has on the constant-factor analysis of performance).

CodePudding user response：

Introsort takes your concern into account, and switches to an insertion sort implementation for short sequences.

Since your STL already provides it, you should probably use that.