Can I assume that Thrust stable_sort_by_key
performed on unsigned int
has complexity O(n)
? If not what should I do to be sure that this complexity will be achieved? (Except of implementing radix sort on my own)
CodePudding user response:
It depends a bit on your circumstances/view. Just from the docs/API there doesn't seem to be a guarantee for thrust::stable_sort_by_key
on unsigned int
keys using a radix sort.
On the other hand the necessary algorithm cub::DeviceRadixSort::SortPairs
is implemented in the CUB library which is used by Thrust in the backend and there is no good reason for Thrust not to use it as the prerequisites can easily be queried at compile time.
From the code in thrust/system/cuda/detail/sort.h
(the "detail" should warn you that this is not part of the public API) one can see that thrust::stable_sort_by_key
can launch a cub::DeviceRadixSort::SortPairs
under the right circumstances (arithmetic key type and using thrust::less
or thrust::greater
as comparison operation) at least on the main branch of Thrust at the time of writing (ca. v2.0.0rc2) but probably for a long time already. Else it will fall back to a merge sort.
Directly using cub::DeviceRadixSort::SortPairs
could have benefits even if this is enough for you, as this makes it easier to reuse temporary buffers and avoid unnecessary synchronization. Both can be done in Thrust via a
auto exec = thrust::cuda::par_nosync(custom_allocator).on(custom_stream);
execution policy (the most recent CUDA Toolkit 11.8 still comes with old Thrust v1.15 without v1.16 par_nosync
). One thing that can not be avoided using Thrust is the in-place nature of the sorting algorithms which is achieved by potentially copying the results back to the input buffers. These copy operations can only be elided using CUB.