I am currently learning Rust (using the Rust book), and one page mentions counting the number of times the sorting key was used while sorting an array. I modified the code in order to count this for arbitrary sizes, and here is the code :
fn main() {
const MAX: i32 = 10000;
for n in 1..MAX {
let mut v: Vec<i32> = (1..n).collect();
let mut ops = 0;
v.sort_by(|x, y| {
ops = 1;
x.cmp(y)
});
if n-2 >= 0 {
assert_eq!(n-2, ops);
}
// println!("A list of {n} elements is sorted in {ops} operations");
}
}
However, it seems that in order to sort a vector of n elements, Rust only needs n-2 comparaisons (the code above runs without panicking). How can this be possible ? Aren't sorts supposed to be in O(n*log(n)) ? Is it because Rust somehow "noticed" that my input vector was already sorted ?
Even in that case, how can a vector of length 2 can be sorted without any comparaisons ? Shouldn't it at least be n-1 ?
CodePudding user response:
The biggest misconseption you have, I think, is:
fn main() {
const SIZE: i32 = 5;
let v: Vec<i32> = (1..SIZE).collect();
println!("{}", v.len());
}
4
The range 1..SIZE
does not include SIZE
and contains SIZE-1
elements.
Further, it will already be sorted, so it's as simple as iterating through it once.
See here:
fn main() {
const SIZE: i32 = 5;
let mut v: Vec<i32> = (1..SIZE).collect();
let mut ops = 0;
v.sort_by(|x, y| {
ops = 1;
let result = x.cmp(y);
println!(" - cmp {} vs {} => {:?}", x, y, result);
result
});
println!("Total comparisons: {}", ops);
}
- cmp 4 vs 3 => Greater
- cmp 3 vs 2 => Greater
- cmp 2 vs 1 => Greater
Total comparisons: 3
it seems that in order to sort a vector of n elements, Rust only needs n-2 comparaisons
That is incorrect. In order to sort an already sorted vector (which yours are), Rust needs n-1
comparisons. It doesn't detect that, that's just an inherent property of the mergesort implementation that Rust uses.
If it isn't already sorted, it will be more:
fn main() {
let mut v: Vec<i32> = vec![2, 4, 1, 3];
let mut ops = 0;
v.sort_by(|x, y| {
ops = 1;
let result = x.cmp(y);
println!(" - cmp {} vs {} => {:?}", x, y, result);
result
});
println!("Total comparisons: {}", ops);
}
- cmp 3 vs 1 => Greater
- cmp 1 vs 4 => Less
- cmp 3 vs 4 => Less
- cmp 1 vs 2 => Less
- cmp 3 vs 2 => Greater
Total comparisons: 5
CodePudding user response:
FYI sort_by
:
pub fn sort_by<F>(&mut self, mut compare: F)
where
F: FnMut(&T, &T) -> Ordering,
{
merge_sort(self, |a, b| compare(a, b) == Less);
}
and it actually invokes merge_sort
:
/// This merge sort borrows some (but not all) ideas from TimSort, which is described in detail
/// [here](https://github.com/python/cpython/blob/main/Objects/listsort.txt).
///
/// The algorithm identifies strictly descending and non-descending subsequences, which are called
/// natural runs. There is a stack of pending runs yet to be merged. Each newly found run is pushed
/// onto the stack, and then some pairs of adjacent runs are merged until these two invariants are
/// satisfied:
///
/// 1. for every `i` in `1..runs.len()`: `runs[i - 1].len > runs[i].len`
/// 2. for every `i` in `2..runs.len()`: `runs[i - 2].len > runs[i - 1].len runs[i].len`
///
/// The invariants ensure that the total running time is *O*(*n* \* log(*n*)) worst-case.
#[cfg(not(no_global_oom_handling))]
fn merge_sort<T, F>(v: &mut [T], mut is_less: F)
how can a vector of length 2 be sorted without any comparisons? Shouldn't it at least be n-1?
(1..2)
returns a slice of length 1 (start from 1, but less than 2). So, when n == 2
in your code, please note that the length of the vector is one.
Let me demonstrate how it will actually go in the merge_sort
if the input is a slice shorter than or equal to 2.
// MAX_INSERTION: 20
if len <= MAX_INSERTION {
// if the len is less than 1, it won't use `is_less` closure to let you count the cmp.
if len >= 2 {
for i in (0..len - 1).rev() {
insert_head(&mut v[i..], &mut is_less); // <- go into `insert_head`.
}
}
return;
}
fn insert_head<T, F>(v: &mut [T], is_less: &mut F)
where
F: FnMut(&T, &T) -> bool,
{
if v.len() >= 2 && is_less(&v[1], &v[0]) // <- here it uses the closure to make comparison.
So if your input is less than 21, short arrays will get sorted in place via insertion sort to avoid allocations.